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* fS74 Abstract- A system and a method are provided for generating (28) a schema (22) for a relational database coiresponding to a 

> Sme^a^n^To— 4^:^ and dal complying with the document-type definU^on ^ 

> definition (18) has content particles representath/e of the structure of the document data mcludmg one or more of the ^1°^". 

> ^nrpai!cis:^lements,rbutesof^ 

schema (22) ordering indicators, existence indicators, occurrence indicators ^'^^'^'T^"' ^ 
i method described hLin also contemplates loading the data into the «la,ional database « 

; schema (22). Metadata is ext^cting from the document-type definition (18) represeniauve °f j^^^^yP^ J^^ 
' schema 22) for the relational database is generated f^m the metadata, wherein at least -^^'^"'^^^^^^^^^ 

> dambase corresponding » at least one content particle of the document-type (^8>;,'^ ^^^^^^ 

: loaded into the ^least one table of the relational database according to the relational schema (22) m a manner doven by the metadata 
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jgy pTTTM AND MET P m^ ATTTOMATTr T.OADINr. OF AN XN^GL 

gyj ATfONAL DATABASF TNC T TmTNa THE GFNFB ATTON OF A 
am ATiONAT. snffFMA THEKEEQB 

5 Background of the Invention 

Tt..lpf^H Appliratinns 

[0001] This application claims priority of U.S. Provisional Application No. 60/182,939 
entitled "METHOD AND APPAEATUS FOR AUTOMATIC LOADING OF XML 
DOCUMENTS INTO RELATIONAL DATABASES." filed February 1 6. 2000 . 
10 Firfri of the Tnvention 

[0002] The invention relates to a method and system for automatically loading an 
extensible markup language (XML) document, as validated by a document-type definition 
(DTD), into a relational database. 
Ppsr^P*'"" Related Art 

15 [0003] Touted as the ASCH of the fixture, extensible Markup Language (XML) is used 
to define markups for information modeUng and exchange in many industries. By enabling 
automatic data flow between businesses, XML is contributing to efforts that are pushing 
the worid into the electronic commerce (e-commerce) era. It is envisioned that collection, 
analysis, and management of XML data will be tremendously important tasks for the era 
20 of e-commerce. XML data, i.e., data surrounded by an initiating tag (e.g.. <tag>) and a 
terminating tag (e.g., </^g>) can be validated by a document-type definition (DTD) as 
vAll be hereinafter described. As can be seen, boldface text is used to describe XML and 
DTD contents as well as names for table and document tags and fields. 
[0004] Some background on XML and DTDs may be helpfiil in understanding the 
25 difficulties present in importing XML data into a relational database. XML is currently 
used both for defining document markups (and, thus, information modeling) and for data 
exchange. XML documents are composed of character data and nested tags used to 
document semantics of the embedded text. Tags can be used freely in an XML document 
(as long as their use conforms to the XML specification) or can be used in accordance 
30 vAth document-type definitions (DTDs) for which an XML document declares itself in 
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conformance. An XML document that conforms to a DTD is referred to as a valid XML 
document. 

[0005] ADTD is used to define allowable structures of elements (i.e.. it defme allowable 
tags, tag structure) in a vaUd XML document. ADTD can basically include four kinds of 
5 declarations: element types, attribute lists, notations, and entity declarations. 

[00061 An element type declaration is analogous to a data type definition; it names an 
element and defines the allowable content and structure. An element may contain only 
other elements (referred to as element content) or may contain any mix of other elements 
and text, one such mbced content is represented as pcdata. An empty element type 
10 declaration is used to name an element type without content (it can be used, for example, 
to define a placeholder for attributes). Finally, an element type can be declared with 
content ANY meaning the type (content and structure) of the element is arbitrary. 

[00071 Attribute-list declarations define attributes of an element type. The declaration 

includes attribute names, default values and types, such as cdata, dotation, and 
1 5 ENOMERATiaN. Two Special types of attributes, id and iDKEr. are used to define 

references between elements. An id attribute is used to uniquely identify the elennent; an 

IDBEF attribute can be used to reference that element (it should be noted that an idbbps 
. attribute can reference multiple elements), entity declarations facilitate flexible 

organization of XML documents by breaking the documents into multiple storage units. 
20 A NOTATION declaration identifies non-XML content in XML documents. It is assumed 

herein that one skilled in the art of XML documents that include aDTD is familiar with 

the above terminology. 

[00081 Element and attribute declarations define the stnicture of compliant XML 
documents and the relationships among the embedded XML data items, entity 
25 declarations, on the other hand, are used for physical organization of a DTD or XML 
document (similar to macros and inclusions in many programming languages and word 
processing documents). For purposes of the present invenrion, it has been assumed that 
entity declarations can be substituted or expanded to give an equivalent DTD with only 
element type and attribute-list declarations, since they do not provide information 
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pertinent to modeling of the data (this can be referred to as a logical DTD). In the 
discussion that follows, DTD is used to refer toalogical DID. The logicalDTO in 

Example 1 below (for books, articles and authors) is used throughout for illustration. 
Example 1 DTD for Books , Articles, and Authors. 

<TEISMERT book (book-tl^e, (author* | edi to*) 1> 

aSSf W£™-o., ...ilia^on.).. con^c^n«.o«. ) > 
<}ELEMENT title (#PCI«."IA)> 

<,ELEMEaJT contaetanthoxs EMPTY> #BEQXJIBED> 
<!ATTLIST oontaotautHors ^"^^^''fj^^ ™^ Jr'^'" 
<.EIEMENT monogxaph (title, authox, editor) > 

< I EIEMEKT editor (book | monograiA) *> 
OATTLIST editor name C3JATA #IMPI.IED> 
<iEI£MENT author (ri.aine)> 
<;XTTI.1OT author id ID «SEQUIBE^ 
<|EUaiEMT name (f irstaame? , la8tnaiiie) > 

< I ELEMENT firstname (#PCa>ATA.)> 
<!EIJSMEMT lastname (#PCI».TA>> 
<|ElEMEliIT affiliation M«> 

10009] Tte ttsk of devdoping a rdarional schema fcr XML doa.ments requires 
understandtag the components oS and relationships withi,^ such documents ADTD 
defines a stn.cn... for XML documents thst am be seen as an ordered graph composed of 
element t^e declarations. A DTD has the Mowing p.op«ties (also referred to harem as 
data and/or content particles) : 

lOOlO] Grouping. Within an element type defuution. elements that are associated within 
parentheses participate in a groupir»g relationship, and are defined as a group. This 
relationship can be further classified as sequence grouping (wherein the elements are 
separated by a comma ' , ') or choice grouping (wherein the elements are separated by a 
D vertical bar V) according to the operator used as a delimiter in the grouping. 

10011] Ne^n^. An element type definition provides the mechanism for modeling 
relationships that can be represented structurally as a hierarchy of elements. THese 
relationships are referred to as a nesting relationship (a structural view can be chosen to 
avoid having to choose particular semantics for all such relationships). 
;5 [0012] Schema Ordenn^. The logical ordering among element types, specified in a DTD. 
For example, line one of the DTD in Example 1 specifies that in abook, abooktitle 
precedes a list of authors (or an editor). 
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[0013] Bdstence: In a DTD, an element type with no content declares the existence of an 
element with no structure or value for that element type. V^s kind of virtual element is 
declared so that attributes can be defined for unique properties of the element or for 
element referencing relationships. 
5 [0014] Occurrence: Occurrence indicators (i.e.. and %") indicate optional 
occurrence or repetition of a content particle in an element definition. For example, in 
Example 1. the grouping (book i monograph) can appear zero or more times in an editor 
element which is indicated by the following this grouping in line eight of the DTD in 
Example 1. 

10 [00151 Ekmera Referencing: Elements reference one another using attributes of type id 
and iDBEF(s). For example, in Example 1. eontaetauthors has an element reference 
relationship with author. 

[0016] These properties and relationships illustrate that a DTD not only defines element 
types for conforming XML documems, but also provides valuable information about the 

15 structure of an XML document and relationships between its elements. Besides that, 
while the DTD specifies a schema ordering between element types thereof; XML 
documents also have a physical ordering, or data ordering, of data elements. The 
challenges of mapping a DTD to a relational database model arise fi-om a mismatch 
between (1) types of properties and relationships embedded in DTDs and (2) those 

20 modeled by relational database models. 

[0017] Turning to the task of loading XML data (as validated by a DTD) into a relational 
database, the prior art in database systems must be considered. Database systems are 
traditional, well-known tools for managing data. After many years of development, 
database technology has matured and contributed significantly to the rapid growth of 

25 business and industry. Relational database systems are a proven technology for managing 
business data and are used everywhere by various sizes of companies for their critical 
business tasks. Commercial relational database products embody years of research and 
development in modeling, storage, retrieval, update, indexing, transaction processing, and 
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con^ur^cy control and co«toe to add capaWWes to address new tods of dau, such 
as multimedia. 

[0018] V» more and more data flowing in XML formats, there havebeen at^^pts to 
e«end prior ar. relational database systems to accommodate XML data Such an 
^oach has avoided re-inveming database technology to suit XML d«. but. ntore 
imporumtly. takes best advantage of the pow. of relational dauhase technology and the 
wealth of experience in optimizing and uang the technology. 
10019] There have been several problems to overcome in bringing XML data into a 
reMion^ database for management including defining a relational schema for the XML 
data loading the XML dau into a relational database, and transfbnmng XML quenes 
(whetherfbrmulated in eCensible stylesheet language [XSL], XML Que^ Language 
tXML-QL] or other XML query standards) into meaningM structured query language 
[SQL] queries. 

10020] Prior attempts to solve ti,ese problems have Men short of an efficie« and. 
; preferablyautomaticway.oimpor,XMLdatamtoarelaion.ldatabaseschem. 

Current mdustry enterprise d«ahase managemem system (DBMS) vendors, such asDB2 
and oracle Si. provide XML e^ensions for bringingXML data intoarelational database. 

However, these methods are fi. from autom^ic. -n-ese vendors requite users to manually 
design the relational schema for a given DTO and to define the mapping between the 
0 DTD and the user-designed schema fbr the loading of XML documents. Whle tins 
manual approach can be straightforward, and ti>=se vendors provide tools to ass« the 
user, users must be ve, fa^liar with the XML dat^ the DTD therefor and the particuUr 
database system used. 

[0021] In addition, the prior an approach only works weU for generating a relatively 
„ simple relational schema, and is not efi^ve or efficiem when the dau contains complex 
rela.ionsh.ps between tables. 1. .s most appropriate for a small number of short or 
femiliar DTOs. The known approach also requires experts on both XML and relational 
techniques. For more complex DTDs and a more robust relational schema, the manual 
approach becomes more difficult and requires specialized expenise. The common 
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existence of non-straightforward relational database schenia cases requires a inore 
advanced approach to generating the relational schema and defining the load mapping 
definition. 

[0022] Other attempts to conceive a method that automatically loads XML data into a 
5 relational database have abo proven to be of limited success. In accordance with these 
failed attempts, the user is required to either mine XML documents or. to simplify the 

DTD. In either case, semantics captured by the DTD or XML documents are lost, e.g., 

how elements may be grouped within the data or as defined by the DTD, and how to 

distinguish between DTD specific symbols, such as * and +, etc. 
1 0 [0023] Other prior art attempts to load XML data into a relational database schema 

include one method by which a large amoum of data is placed into relational databases by 

creating a relational data schema and applying data mining over a large number of the same 

type of XML documents, and then abstracting a relational data schema out of them. 

Then, the data is loaded into the relational tables. For parts that cannot fit into the 
1 5 relational table schema, an overflow graph is generated. 
■ 10024] Others have done benchmark testing on a relational schema generated out of XML 

data by four variations of a basic mapping approach - but this work does not consider or 

does not require a DTD. . 

[0025] However, one known form of benchmark testing that does consider a DTD was 
20 performed by Shanmugasundaram et. al. who investigated schema conversion techniques 
for mapping a DTD to a relational schema by giving (and comparing) three alternative 
methods based on traversal of a DTD tree and creation of element trees. This process 
simplified the DTDs and then mapped those into a relational schema. While this 
approach also works with simple XML data struaures, portions of the structure. 
25 properties and embedded relationships among elements in XML documents are often lost. 
[0026] Instead of bringing the semi-structured data into the relational model, there are 
other approaches to bring the XML data into semi-structured, object-oriented, or object- 
relational database managemem systems (DBMS). Some commercial rational DBMSs, 
e.g., IBMs DB2 and Oracle, have begun to incorporate XML techniques into their 
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databases, e.g., IBM Alphaworks visual XML tools, IBM DB2 XML Extender, and 
Oracle 8i. 

[0027] Recently IBM's Alphaworks proposed a new set of visual XML tools that can 
visually create and view DTDs and XML documents. In its tools. IBM has proposed the 
idea of breaking DTDs into elements, notations, and entities-that use components 
grouped with properties sequential, choice, attribute, and relationship and with repetition 
properties to construct DTDs. Tools to do XML translation and XML generation from 
SQL queries are provided. However, a method by which to load the XML data into 
relational tables was not addressed in this prior art. 

[0028] The DB2 XML Extender can store, compose and search XML documents. 
Document storing is accomplished in two ways. First, the XML document is stored as a 
whole for indexing, referred to as way of storing the XML document as an XML column. 
Second, pieces of XML data are stored into table(s), referred to as XML coUections. 
[O029] Oracle 8i is an XML-enabled database server that can do XML document reading 
and writing in the document's native format. An XML document is stored as data and 
distributed across nested-relational tables. This XML SQL utiUty provides a method to 
load the XML documents in a canonical form into a preexisting schema that users 
manually (and previously) designed. 

Summary of the Invention 
, [0030] According to the invention, a relational schema definition is examined for XML 
data, arelational schema is created out of aDTD, and XML data is loaded into the 
generated relational schema that adheres to the DTD. In this manner, the data semantics 
implied by the XML are maintained so that more accurate and efficient management of 
the data can be performed. 
5 [0031] Starting with a DTD for an XML document containing data (rather than analyzing 
the relationships between the actual elements of the XML data), all of the information in 
the DTD is captured into metadata tables, and then the metadata tables are queried to 
generate the relational schema. Then, the data contained in the XML documem can be 
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loaded into the generated relational schema. This method can be described by three broad 
steps: 

[0032] First, the DTD is stored as metadata (i.e., a transformation and/or recasting of the 
data contained in the DTD) in tables - i.e, the metadata is used to describe the 
5 information of the DTD associated with the XML documents. This approach provides 
flexibility to manipulate the DTD by standard SQL queries. 

[0033] Second, a relational schema is generated from the metadata stored in the metadata 
tables. This step provides additional flexibUity in that, although the relational schema can 
be directly mapped from the metadata tables according to the invention, the metadata 
10 tables can also be queried to do optimizing or restrucftiring on the metadata tables 
representative of the XML data structure stored in the DTD. 

[0034] Third, data contained in the XML document is loaded into the tables as defined by 
the relational schema (which is generated in the previous step), by using the associated 
metadata tables. 

15 [0035] According to the invention, the inventive metadata-driven approach includes the 
following beneficial characteristics: 

[0036] For Storing. Ail of the information contained in the DTD is captured in the 
metadata tables. It is anticipated that the XML document and the DTD can be 
reconstructed from the relational data and metadata as needed. 
20 [0037] For Mapping. The generated relational schema is rich in relationships that are 
usefiil in processing queries. 

[0038] For Loading: Mappings between the XML document and the final relational 
schema are captured in the metadata tables. It is contemplated that the relational data can 
be synchronized with XML data, which means whenever there is a data update in the 
25 relational data, the effect is also reflected in the XML data, and vice versa. 

[0039] In one aspect, the invention relates to a method for generating a schema for a 
relational database corresponding to a document having a document-type definition and 
data complying with the document-type defmition. The document-type definirion has 
content particles representative of the stmcture of the document data including one or 
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more of the foUowing content particles: elements, attributes of elements, nesting 
relationships between elements, grouping relationships between elements, schema 
ordering indicators, existence indicators, occurrence indicators and element ID referencing 
indicators. The method also contemplates loading the data into the relational database in a 
manner consistent with the relational schenia. 

[00401 The method comprises the steps of: ertr^ng metadata from the document-type 
definition representative of the document-type definition; generating the schema for the 
relational database from the metadata, wherein at least one table is thereby defined in the 
relational database corresponding to at least one content particle of the document-type 
definition via the metadata, and at least one column is defined in each of the at least one 
table corresponding to another of at least one content particle of the document-type 
definition; and loading the document data into the at least one table of the relational 
database according td the relational schema. 

[0041] The extracting step of the inventive method can fiirther comprise the steps of: 
generating an item metadata table con-esponding to elemem type content particles in the 
documem-type definition; creating at least one default item in ti.e item metadata table, 
generating a row in the item metadata table corresponding to each of the element type 
contem particles of the document-type definition; generating an attribute metadata table 
coiresponding to attribute type content particles in the document-type definition; 

» creating a default attribute value in ti»e attribute metadata table corresponding to any 
defeult items in the item metadata table; generating a row in ti.e attribute metadata table 
corresponding to each of the attribute type contem particles of each element type stored 
in the item metadata table; generating a nesting metadata table corresponding to nesting 
relationship content particles m the document-type definition; and generating a row m the 

5 ne^ing metadata table corresponding to each relationship between items identified in the 
item metadata table. 

[0042] In some embodiments of the invention, the generated nesting table row can 
indicate die cardinality between a pair of items. The cardinality can be one of one-to-one 
and one-to-many. The generated nesting table row can indicate a relationship between a 
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parent item and a child item. The generated nesting table row can indicate the position of 
the child item in a definition of the parent item. 

[0043] In other embodiments of the invention, the generating step can further comprise 
the steps of: creating a table in the schema of the relational database corresponding to 

5 each row of the metadata item table; generating at least one defeult field in the table of the 
schema; altering the schema of the relational database to add a column to each table in the 
schema corresponding to each row of the metadata attribute table related to the particular 
metadata item table row; altering the tables in the schema of the relational database to add 
links between tables in the schema corresponding to a relationship identified in each row 

10 of the metadata nesting table; altering the tables in the schema of the relational database 
by adding a foreign key to a parent table if the identified relationship is a one-to-one 
relationship; and altering the tables in the schema of the relational database by adding a 
foreign key to a child table if the identified relationship is a one-to-many relationship. 
[0044] In additional embodiments of this invention, tiie loading step can fiirther comprise 
-15 the steps of: initializing a link table; determining whether each item in the metadata 

nesting table contains a group type; initializing a pattern-mapping table; directly mapping 
a link into the link table for each item in the metadata nesting table that does not contain a 
group type; creating an additional link table containing a mapping of a link pattern for 
each group type identified in the metadata item table; retrieving a preselected set of rows 

20 corresponding to each item in the metadata item table; mapping a create tuple loading 

action in the pattern mapping table corresponding to each item in the item metadata table; 
mapping an update tuple loading action in the pattern mapping table corresponding to 
each attribute in tiie attribute metadata table; mapping a create tuple loading action in the 
pattern mapping table corresponding to each group in a link; mapping an assign action 

25 tuple loading action in the pattern mapping table corresponding to each pair in the same 
link corresponding to each link in the link pattern table; and forming a tree structure with 
the document data; and traversing the formed tree and updating the at least one relational 
database table according to the rows of the pattern mapping table. 
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[00451 In yet further embodiments of this invention, the method can also comprise the 
step of optimizing the metadata. This optimizing step can further comprise the steps of: 
eliminating duplicate particle references in the metadata; and simpUfying references to 
corresponding elements, Unks and attributes in the metadata. 

[0046] In other aspects of the invention, a system is provided for generating a schema for 
a relational database con-esponding to a document having a document-type definition and 
data complying with the document-type definition and loading the data into the relational 
database in a manner consistent with the relational schema. 

[0047] The key inventive features of the system include an extractor adapted to read a 
document-type definition that extracts metadata from the document-type definition 
representative of the document-type definition; a generator operably intercomiected to 
the extractor for generating the schema for the relational database fi-om the metadata, 
wherein at least one table is thereby defined in the relational database corresponding to at 
least one content particle of the document-type definition via the metadata, and at least 
one column is defined in each of the tables corresponding to another content particle of 
the document-type definition; and a loader operably interconnected to the generator for 
loading the document data into the table(s) of the relational database according to the 
relational schema. 

[0048] In various embodiments of the extractor for the system, the extractor can generate 
an item metadata table corresponding to element type content particles in the document- 
type definition. The extractor can create at least one default item m the item metadata 
table. The extractor can generate a row in the item metadata table corresponding to each 
of the element type content particles of the document-type definition. The extractor can 
generate an attribute metadata table corresponding to attribute type content particles in 
the document-type definition. The extractor can generate a row in the attribute metadata 
table corresponding to each of the attribute type contem particles of each element type 
stored in the item metadata table. The extractor can generate a nesting metadata table 
corresponding to nesting relationship content particles in the document-type definition. 
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The extraaor can generate a row in the nesting metadata table corresponding to each 
relationship between items identified in the item metadata table. 
[0049] In various embodiments of the generator for the system described herein, the 
generator can create a table in the schema of the relational database corresponding to each 
row of the metadata item table. The generator can alter the schema of the relational 
database to add a column to each table in the schema corresponding to each row of the 
metadata attribute table related to the particular metadata item table row. The generator 
can alter the tables in the schema of the relational database to add links between tables in 
the schema corresponding to a relationship identified in each row of the metadata nesting 
table. The generator can alter the tables in the schema of the relational database by adding 
a foreign key to a parent table if a . relationship identified between a pair of tables is a one- 
to-one relationship. The generator can alter the tables in the schema of the relational 
database by adding a foreign key to a child table if a relationship identified between a pair 
of tables is a one-to-many relationship. 

[0050] In various aspects of the loader of this system, the loader can initialize a link table 
and/or a pattern-mapping table. The loader can determine whether each item in the 
metadata nesting table contains a group type content particle. The loader can directly 
map a Unk into the link table for each item in the metadata nesting table that does not 
contain a group type. The loader can create an additional link table containing a mapping 
of a link pattern for each group type identified in the metadata item table. The loader can 
retrieve a preselected set of rows corresponding to each item in the metadata item table. 
The loader can map a create tople loading aaion in the pattern mapping table 
corresponding to each item in the item metadata table. The loader can map an update 
tuple loading action in the pattern mapping table corresponding to each attribute in the 
attribute metadata table. The loader can map a create tuple loading action in the pattern 
mapping table corresponding to each group in a link; and map an assign action tuple 
loading action in the pattern mapping table corresponding to each pair in the same link 
corresponding to each link in the link pattern table. The loader can form a tree stmcture 
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with the document data. The loader can traverse the fonned tree and update the relational 
database table(s) according to the rows of the pattern mapping table. 
[0051] In other embodiments of the system, the system can also be provided with an 
optimizer for refining the metadata. The optimizer can eUminate duplicate particle 
references in the metadata. The optimizer can simplify references to corresponding 
elements, links and attributes in the metadata. 

[0052] In other embodiments of the invention, the document can be an XML document. 
The document-type definition can be a DTD. The data can be tagged data. 

BriefDescription oftheDrawings 

In the drawings: 

[0053] Fig. 1 is a schematic view detailing a system for generating a relational schema 
fi-om a document type definition, forming a relational database fi-om the relational schema 
and loading the contents of an extensible document into the relational database accordmg 
to the relational schema. 

[0054] Fig. lA is a diagrammatic representation of the system of Fig. 1 showing the 
extraction of a document-type definition fi-om an extensible document, the generation of a 
relational schema therefi-om and the loading of data contained in the extensible document 
into the relational database. 

[0055] Fig. IB is a schematic representation of interaction between tables created in the 

system and method shown in Figs. 1 and 1 A. 

[0056] Fig. 2 is a flowchart detailing three broad method steps according to the invention 
schematically shown in Figs. 1, 1 A and IB, namely, storing the document-type definition 
into metadata tables, creating a relational database table schema fi-om the metadata of the 
metadata tables, and loading data contained in an extensible document into tables 

contained in the formed relational schema. 

[00571 Fig. 3 is a flowchart detaihng the step for storing the document-type definition 
into metadata tables shown in Fig. 2 in which a metadata item table is created and fiUed 
with element-types declared in the document-type definition. 
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[0058] Fig. 4 is a flowchart detailing another step of the method shown in Fig. 2 of 
storing the document-type definition information into metadata tables in which a 
metadata attribute table is created and filled with attributes defined in the document-type 
definition. 

[0059] Fig. 5 is a flowchart showing another step of the method shown in Fig. 2 of 
storing the document-type definition table into metadata tables including the step of 
buUding groups and forming a metadata nesting table fi-om the metadata item table formed 
in Fig. 3. 

[0060] Fig. 5 A is a flow chart detailing a portion of the flow chart of Fig. 5 corresponding 
to the steps to be performed when an element type is encountered during generation of 

the metadata nesting table. 

[0061] Fig. 6 is a flowchart detailing the step of the method of Fig. 2 in which a relational 
database schema is generated firom the metadata tables including the step of forming tables 
for each element type in the metadata item table formed in Fig. 3 with defeult fields 
provided therein. 

[0062] Fig. 7 is a flowchart detailing another method step from the method shown in Fig. 
2 in which tables are created to form a relational table schema firom the metadata tables in 
which columns are added to the tables formed in the method shown in Fig. 6 for each of 
the attributes defined in the metadata attribute tables formed in Fig. 4. 
[0063] Fig. 8 is a flowchart detailing a method step corresponding additionally to the 
method step in Fig. 2 in which a relational schema is created fi-om the metadata tables in 
which nesting relationships are determined between the attributes of the various tables 
and index columns are added to the various tables in the schema corresponding to those 
nesting relationships identified in the method step in Fig. 2 in which the metadata nesting 
table is constructed. 

[0064] Fig. 9 is a flowchart corresponding to the initialization of the pattern mapping 
table as described in Figs. 1, 1 A, IB and 2. 

[0065] Fig. 1 0 is a flowchart corresponding to the method step of Fig. 9 in which at least 
one link table is initialized. 
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[0066] Fig. 1 1 is a flowchart corresponding to a method step in Fig. 9 in which the 
pattern mapping table is initialized from the data contained in the metadata tables fonned 
according to the method steps of Fig. 2. 

[0067] Fig. 12 is a flowchart corresponding to the method step of Fig. 2 corresponding to 
5 loading data contained in an extensible document into the relational tables formed in the 
method steps of Fig. 2. 

[0068] Fig. 13 is a flowchart corresponding to the traversal of a node tree defined in the 
flowchart shown in Fig. 12 for traversing the node tree and inserting data into the 
relational table schema fonned in the method steps of Fig. 2. 
10 [0069] Fig. 14 is an example of a node firee discussed with respect to Figs. 12-13 having 
data and complying with a document-type definition corresponding to Example 1 
described in the Background section herwn. 

Description of the Preferred Embodiment 
[0070] According to the invention, a relational schema is created out of a DTD, metadata 
1 5 is extracted from the DTD that describes the DTD and that illustrates how the DTD 
maps to the schema, and XML data is loaded into the generated relational schema that 
adheres to the DTD according to the metadata. In this manner, and as a direct result of 
the metadata analysis and storage, the data semantics implied by the XML are maintained 
so that more accurate and efficient management of the data can be perfonned. 
20 [0071] Starting with a DTD for XML documents containing data (rather than analyzing 
the relationships betwe«i the actual elements of the XML data), all of the infonriation in 
the DTD is captured into metadata tables, and then the metadata tables are queried to 
generate the relational schema. Then, the data contained in the XML documem can be 
loaded into the generated relational schema. This method can be described by tiiree broad 
25 steps: 

[0072] First, the DTD is stored as metadata in tables - i.e., the metadata is used to 
describe tiie information of the DTD associated with the XML documents. This 
approach provides flexibihty to manipulate tiie DTD by standard SQL queries. 
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[0073] Second, a relational schema is generated from the metadata stored in the metadata 
tables. This step provides additional flexibility in that, although the relational schema can 
be directly mapped from the metadata tables according to the invemion, the metadata 
tables can also be queried to do optimizing or restructuring on the metadata tables 
5 representative of the XML data structure stored in the DTD. 

[0074] Third, the data contained in the XML document is extracted from the document 
and stored into the tables defined by the relational schema, which is generated in the 
previous step, by using the associated metadata tables. 

[0075] According to the invention, the inventive metadata-driven approach includes the 
1 0 following beneficial characteristics: 

[0076] For Storing: All of the information contained in the DTD is captured in the 
metadata tables. It is anticipated that the XML document and the DTD can be 
reconstructed from the relational data and metadata as needed. 
[0077] For Mapping. The generated relational schema is rich in relationships that are 
1 5 useful in processing queries. 

[0078] For Loading: Mappings between the XML document and the final relational 
schema are captured in the metadata tables. It is contemplated that the relational data can 
be synchronized with XML data, which means whenever there is a data update in the 
relational data, the effect is also reflected in the XML data, and vice versa. 
[0079] It will be understood that it has been assumed that there exists only one external 
DTD file for compliant XML documems and that the file has no nested DTDs, and that 
there is no internal DTD in the XML files. Of course, to the extent that XML documents 
are encountered that include these items, this requirement can be achieved by pre- 
processing XML documents with nested or internal DTDs as needed. 
25 Figure 1 

[0080] Turning now to the drawings and to Fig. 1 in particular, a system 10 is shown for 
automatically loading a document 12 into a relational database 14. As can be dearly seen 
from Fig. 1, the document 12 comprises a first data portion 1 6 and a second document 
definition portion 18. It will be understood that the documem 12 is preferably an XML 
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do=umen.4=fir«d.«portionl6isprcfe™blya«>n.pU».s=.of«gg«ldatanonmllv 

found in ado=um=ntfo™«edinU,.XMLk»B^ =«lth.t.gs»recon>pUa„. vnth th. 
second docun,en. definition portion 18*atcomprisesadocumen,-wedeani.K>n (DTD) 
„is well known in .he art (.he document 12 i. shown surrounded byabrok^linemF* 
1 indicatingaposMblc separated of tteDTD 18 and .he data ,6 sinceti,eim. 18 can be 
provided separably fton, .heXMLd«a ,6 a. is also well known). A. can also be dearly 
seen fronr Fig. 1, relational daBb.se 14 comprises a firs, s.orage portion 20 and a 
second data deflmtion portion 22. I. will be understood .hat ti,e relational database 14 
can be any of the well-known relational databases. The first data storage portion 20 ,s 
typically and p«ferably. a set oftab.es in the relational database 14. The second data 
definition portion 22 preferably comprises a reUtional schema as is typically used to 
„odel. outline or diagram fteintertelationship between tables inareUtion^ database. 
[00811 I.is.nimportan.fea.ureof«sinve«iontha.ti.eDTD180.e..thesecond 
document definition portion ,8) is loaded by *e sy«em ,0 and used in metadau tonna. 
,0 generate ti« relational schema of the second data definition portion 22. Then, tite 
XML data stored in .he firs, data portion 1 6 of the XML documen. 12 is loaded by ti,e 
system 10 into the tables making up the first da« storage portion 20. 
[0082] In order to accompBsh ti,ese flmctions. the sys»m 10 comprises an extractor 24, 
an optimizer 26,a8e„era.or 28 and. loader 30 all ofwhichareinte«.nnectedtoastorag. 

5 unit 32. AS contemplated by .his invention, .he s.orage unH 32 comprises a, leas, a 
meodata table stomge portion 34 and a pattern mapping table storage portion 36. 
[0083] According to .he me.hod of Ais invention, which will be hereinafter descnbed m 
peater detaU, ti,e system 10 reads ti,e DID 18 with the e«racor 24 and stores data 
represemative of .he DTD 18 in meudata tables in the metadata tables storage portion 
.5 34 Fromtheda,as.oredinme.adata.ables,thege„era.or28genera.esthereUxtonal 
schema 22 in .he relanonal database 14. In an optional loop, the optimizer 26 can 
^sage the data stored in the metadata tables 34 to create a more efficient sc. of inpu.s 
for .he generator 28 which, in mnt. resuhs m tite generation of a more efficien, relauona, 
schema 22. 
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[00841 Next, once the relational schema 22 has been generated by the generator 28, a 
pattern-mapping table 36 is generated from the metadata tables and fed as an input to the 
loader 30 (in addition to the input of the XML data 16 from the document 12) which, in 
turn, provides an input to load the tables 20 and the relational database 14 with the XML 
5 data 16 stored in the document 12. 

TTir.uRE lA 

[0085] The automatic loading of an XML document 12 into a relational database 14 
according to a document-type definition 1 8 contained in the XML document 12 is shown 
in greater detail in Fig. 1 A. One feature of this invention is the importation of the 

10 information contained in the document-type definition 18 to the extractor 24 to create the 
metadata tables 34 (referred to herein as DTDM, short for document-type definition 
metadata). As can be seen from Fig. 1 A., the metadata tables 34 comprise a DTDM-Item 
table 90 generaUy made up of elements and groups defined in the DTD 18, a DTDM- 
Attribute table 92 generally made up of Attribute of elements and groups contained in the 

1 5 DTD 1 8, and a DTDM-Nesting table 94 generally made up of nesting relationships 
contained in the DTD 1 8 as identified by the extractor 24. 

[0086] The metadata tables 34 are then fed to the generator 28 and. optionally, the 
optimizer 26, to create the link pattern and pattern mapping tables 36. It should be 
understood that the optimizer 26 is entirely optional and can be omitted without 

20 departing from the scope of this invention. When used, the optimizer 26 provides the 
additional benefits discussed herein. The tables 36 comprise an IM-Item table 96 which 
contains mapping information relating to the DTDM-Item table 90, an IM-Attribute table 
98 that contains mapping information relating to the DTDM-Attribute table 92, an IM- 
Nesting table 100 that contains mapping information relating to the DTDM-Nestmg table 

25 94, and aTS-JCtable 1 02 for containing table schema and join constraint information for 
assisting in the generation of the table schema 22 for the relational database 14. It will be 
understood that the metadata tables 34 are preferably necessary for generation of the table 
schema 22. However, it has been found that the generation of the link pattern and pattern 
mapping tables 36 can resuh in the generation of a more efficient table schema 22 for the 
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relational database 14. Then, either the metadata tables 34 or, if the optimizer 26 is 
employed, the pattern and pattern mapping tables 36 are fed to the loader 30 to create 
and fill the tables 20 and the relational database 14 according to the generated table schema 
22 therein. 

Figure IB 

[0087] A schematic representation of the tabular interaction according to this invention is 
shown in Fig. IB wherein the tables shown in Fig. IB refer to the components of the 
invention shown in Figs. 1 and 1 A with like reference numerals. Dashed lines 
interconnecting the illustrated tables indicate a relationship between a field of one table 
with a field in the connected table. 

[0088] Turning now to Figs. 2-13, the method of automatically loading XML data into a 
relational database according to a relational schema defined by a document-type definition 
will now be described. 

Figure 2 

[0089] Fig. 2 describes broad method steps of the invention, shown mainly by the steps 
surrounded by a double-line fi-ame, wherein step 40 indicates that data representative of 
the DTD 18 is stored (via the extractor 24) into the metadata tables storage portion as 
metadata representative of the DTD 1 8 which is referred to as the DTDM tables 34. 
[0090] Step 42 indicates the second broad method step of this invention wherein the 
relational schema 22 is generated (via the generator 28) from the DTDM tables. 
[0091] Step 44 indicates the final broad method step of this invention wherein the XML 
data 16 from the document 12 is loaded (via the loader 30) into the tables 20 of the 
relational database 14 according to the relational schema 22 generated in step 42. 
[0092] Fig. 2 also describes more detailed method steps for each of the broad method 
steps 40, 42 and 44. 

[0093] Namely, step 40 of storing the DTD 18 into the DTDM tables 90, 92 and 94 
preferably comprises steps of creating and fiUing the DTDM-Item table 90 in the 
metadata tables 34 (shown by reference number 46 and described in greater detail in Fig. 
3), creating and filling a DTDM-Attribute table 92 in the metadata tables 34 (shown by 
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reference number 48 and described in greater detail in Fig. 4), creating and storing a 
DTDM-Nesting table 94 in the metadata tables 34 (shown by reference number 50 and 
described in greater detaU in Fig. 5), and initializing a pattern mapping table (shown by 
reference number 58 and described in greater detail in Figs. 9-1 1). 
[0094] Step 42 of creating the relational table schema 22 from the inetadata tables 34 
preferably comprises the steps of creating tables in the relational database 14 (shown by 
reference number 52 and described in greater detail in Fig. 6), adding columns to the tables 
created in step 52 to correspond to attributes from the metadata tables 34 created in step 
48 (shown by reference number 54 and described in greater detail in Fig. 7), and adding 
nesting relationships indicated by the DTDM-Nesting table 94 stored in the metadata 
tables 34 in step 50 (shown by reference number 56 and described in greater detail in Fig. 
8). 

[00951 Step 44 of loading the XML data 16 of the document 12 into the tables 20 of the 
relational database 14 according to the relational schema 22 generated in step 42 
preferably comprises the step of loading the XML data 16 contained in the document 12 
into the tables 20 of the relational database 14 according to the relational schema 22 
generated herein (shown by reference number 60 and described in greater detail in Figs. 
12-13). 

[0096] It will be understood that the focus of the metadata extraction steps begins with 
three empty DTDM tables 34 - the DTDM-Item table 90, the DTDM-Attribute table 
92, and the DTDM-Nesting table 94, the fimction and features of which will be explained 
in detail below. 

[0097] The DTD 18 is first stored into the metadata tables 34 so that it can be optionally 
restruaured, and then the relational schema 22 can be generated from the metadata tables 
34. The storing stage identifies the characteristics of the DTD 18, and stores it as the 
metadata tables 34. The (optional) restructuring stage can identify the multi-valued 
attributes of the DTD 18. and can also identify items that could be represented as 
attributes. Mapping the DTD 18 into the relational schema 22 is achieved by applying 
mapping rules defining transformations over the metadata tables 34 storing the DTD 1 8 



wo 01/61566 



-2 1- 



PCT/USOl/05105 



[0098] One initial step is identifying the types of the objects that will be found in these 
types of data-containing XML documents 12. Three kinds of metadata have beer, 
identified as being relevant for storing these properties: items, attributes, and 
relationships. The three metadata tables 34 for storing the items, attributes and 
relationships defined in the DTD 18 and the properties of the DTD 18 captured by each 
table 90, 92 and 94 are defined as will be hereinafter described. Every item, attribute and 
nesting relationship in the tables 90, 92 and 94 is preferably assigned a unique id as will 
be described in greater detail below. 

[0099] An item represents an object in the DTD 18 that contains, or is contained by, 
other objects. An attribute is a property of an item. An item can have multiple unique 
attributes. The attributes of an element in a DTD are all the attributes of the 
corresponding item in the metadata generated by the invention described herein. A nesting 
relationship is used to show the hierarchical relationships between two items. It denotes 
that a child item is directly nested in a parent item. 

Fic;ure3 

[001001 The step of creating and filling the DTDM-Item table 90 identified by 
reference number 46 in Fig 2 will now be described in greater detaU with respea to Fig. 3. 
As illustrated in Fig. 3, processing moves to step 62 in which, initially, a pair of default 
items are created in the DTDM-Item table 90 referred to as "pcd^t^" and "group" (also 
referred to as any.gkoup m the figures). Proposed SQL statements that could accomplish 
this task are shown in the note associated with step 62 referred to by reference number 
64 Once these initial defauk items are created in step 62, processing moves to step 66 
which initiates a loop for each Element type declaration in the DTD 1 8. For each of the 
Element type declarations, a row is created in the DTDM-Item table 90 as sho^vn by the 
5 proposed SQL statement in note 68. Processing then moves to decision block 70 in 

which it is determined whether any additional Element type declarations exist in the DTD 
1 8. If so, processing returns to step 66. If not, processing ends. 
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[00101] The DTDM-Item table 90 preferably stores element types and groups. 
The table 90 captures the existence property of an element as the Type of the item. It 
also captures a grouping property by creating a new group item for each group. 



nTDM-Item Table Field Contents 


Fields 




ID 


Internal ID for Items. _ 




Element Type or Group Name. 


Type 


Defines the type of this item within this domain: PCOA-xa, 
ELEMENT. EUEMEaiT, EMPTX , ELEMENT . AMY . ELEMENT. MIX, and OTOUP 



[00102] The Type field defines the type of an item, i.e., type of the element 
content in an element type declaration, element . element means an element content. 
ELEMENT. MIX mcans a mix content, element. empty means an empty content. 
ELEMENT.AMY mcans an ANY content. There are two new item types, i.e., pcdata and 
1 0 GROUP . PCDATA , means a pcdata definition, and groxjp means a group definition. 

Figure 4 

[00103] The step of creating and filling the DTDM- Attribute table 92 identified by 
reference number 48 in Fig. 2 will now be described in greater detail with respect to Fig. 4. 
As shown in Fig. 4, processing moves to step 72 in which the item_iD of pcdata is 

1 5 retrieved from the DTDM-ltem table 90. Processing then moves to step 74 in which 
default attribute values for pci>ata are created in the DTDM- Attribute table 92. A 
proposed SQL statement for accomplishing this task is provided in note 76 associated 
with step 74. Processing then moves to step 78, which initiates a loop for each element 
type declaration in the DTD 18. 

20 [00104] Processing then moves to step 80 in which the i tem_iD for each of the 
element type declarations in the DTDM-Item table 90 is retrieved. Processing then 
moves to step 82, which initiates a loop for each attribute of this particular element type. 
For each attribute of this element type declaration, a row is inserted into the DTDM- 
Attribute table 92 providing attribute information corresponding to the elements of the 

25 DTDM-Item table 90. A proposed SQL statement for accomplishing this task is 
provided in note 84 associated wiih step 82. 
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[00105] After each row insertion into the DTDM- Attribute uble 92, processing 

moves to decision block 86 to determine whether additional attributes exist for this 
element type. If so, processing returns to step 82 to process additional attributes. If not, 
processing moves to decision block 88 to determine whether additional elements exist in 
the DTD 18 (i.e., as stored in the DTDM-Item table 90). If so, processing returns to 
step 78. If not processing ends. 

[00106] The DTDM-Attribute table 92 stores attributes of elements or groups. 



DTDM-A-ttribu-te Table Field Con-ten-bs 


Fields 




ID 


In-ternal ID of this attribute. 


PID 


ID of pazent Items of thxs attribute. 


Type 


Yf"-^ of tbis attribute, e.g., AuthorlDs , Id. 
"Type of the attrribute, e.g., ID and IDREFS. 


De£ault 


A keyword or a default literal value of this attribute, 
e.g., # IMPLIED. . . 



[00107] Note that, for now, the id/idkef(s) attributes that represent the element 
reference properties are stored simply as attributes. Later, during a mapping stage, the 
element reference property will be captured and stored in an additional metadata table, 
denoted as the TC-JS table 102 in Fig. 1 A (and JoinConstraint 102 in Fig. IB). 

Figures 

[00108] The step of creating and storing the DTDM-Nesting table 94 identified by 

reference number 50 in Fig. 2 will now be described in greater detail with respect to Fig. 5. 
Processing moves to step 402 in which the DTDM-Nesting table 94 is initialized. 
Pseudopodia indicative of the steps performed in step 402 are given in note 404 
associated with step 402. Processing then moves to step 406, which initiates a loop for 
every element type declaration found in the DTD 18. Processing then moves to step 408, 
which retrieves the type, i.e., mixed, pcdata, any, element, and empty of the particular 
element type being examined in the loop initiated at step 406. Processing then moves to 
step 410 in which the DTDM-Item table 90 is queried to return the identification (id) of 
the element type identified in step 408. 
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[00109] Processing then moves to decision block 412 which determines whether 
the current element type is of type mixed. If so, processing moves to step 414 in which 
a new group is created in the DTDM-Item table 90 with type choice, wherein a 
proposed SQL statement to accomplish this task is shown in note 416 associated with 
step 414. Processing then moves to step 418 in which a nesting relationship is created 
from the current element type to the newly-created group as shown by the proposed 
SQL statement in note 420 associated with step 418. Processing then moves to step 422 
in which a. nesting relationship is created from this group to element type pcdata. ' 
Processing then moves to step 424 in which aU of the nesting relationships from this 
group to its children are created by function fiU_ DTDM_Nesting_Item shown in detail in 
Fig. 5A (via indicator 5A referred to by numeral 444). Processing then moves to decision 
block 426 in which it is determined whether there are additional element types in the 
DTD 1 8 to be processed for the loop initiated at step 406. If so. processing returns to 
step 406. 

[001 10] If the test performed at decision block 412 fails, processing moves to 
decision block 428 which determines whether the current element type is of type any. If 
so, processing moves to step 430 in which one relationship from the current element is 
created to the hem titled amy_sroup. A proposed SQL statement to accomplish this task 
is identified in note 432 associated with step 430 in Fig. 5. Processing then moves to 
decision block 426, which has been previously described. 

[00111] If the test performed at decision block 428 fails, processing moves to 

decision block 434 which determines whether the current element type is of type pcdata. 
If so, processing moves to step 436 in which a relationship to the previously-created 
PCDATA item is created in accordance with the proposed SQL statement shown in note 
438 associated with step 436 in Fig. 5. Processing then moves to decision block 426, 
which has been previously described. 

[00112] If the test performed at decision block 434 fails, processing moves to 

decision block 440 which determines whether the current element type is of type 
Ei^EMENT. If so, processing moves to step 442 which calls a function titled 
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fiUJDTDM.NestingJtem (element_type,-l), the details of which are described in Fig. 
5A (by the connector identified with 5A and indicated by numeral 444). After this 
function has completed, processing moves to decision block 426. which has been 

previously described. 

[00113] If the test performed at decision block 440 foils, processing moves to step 
446 which notes that the current element type is emptx. Processing then moves to 
decision block 426, which has been previously described. 

[001 14] Once aU of the elements have been processed, i.e., all of the element type 
declarations as identified in the loop initiated at step 406 have been processed, processing 
ends. 

Figure 5A 

[001151 The contents of the function (i.e., fiU_DTDM_Nesting_Item) identified by 
the "5A« connectors 444 of Fig. 5 are described in greater detail with respect to Fig. 5A. 
Once this function is caUed, processing moves to decision block 446 in which it is 
determined whether the nesting relationships of the parent item are to be copied into the 
group. If so, processing moves directly to step 448 in which the group_iD is treated as 
an element,™. If not, processing moves to step 450 in which the DTDM-Item table 90 
is queried to determine the id of the parent item (i.e., element type or group) as 
eiexnent_iD. A proposed SQL Statement to accompUsh this task is shown in note 452 
associated with step 450 in Fig. 5 A. In either case, processing then moves to step 454, 
which initiates a loop for the group or element identified in step 448 as eieinent_iD. 
Processing then moves to step 456, which retrieves the object reference to the current 
element type or group being processed. Processing then moves to decision block 458, 
which determines whether the element type or group corresponding to the object 
reference identified in step 456 is of type group. If so. processing moves to step 460, 
which retrieves the group id and stores this id in variable ref_iD. If not, processing 
moves to step 462 in which the type is determined to be an element type declaration. 
Following either step 460 or step 462, processing then moves to step 464 in which the 
previously-determined object reference is stored into the DTDM-Nesting table 92 in a 
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manner consistem with note 466 which shows a proposed SQL statement for 
accomplishing this task. Processing then moves to decision block 466 in which it is 
determined whether any additional references need to be processed. If so, processing 
returns to step 454. If not. processing terminates and returns to continue processing at 
the point Fig. 5 was left via connector 444. 

[00116] the DTDM-Nesting table 94 captures relationships between different 
items, i.e., nesting, schema ordering, and occurrence properties. 



10 



15 



DTDM-Mesti-ng Table Field Ccntents 



FroialD 

ToID 

Ratio 



Optional 



Internal ID of -bhis^ nestincj relatio nship 

ID of parent item of this nesting rel ationship 
ID of child item of this nes ting relationship 
cardinality between the parent element and child 



ei.ement . 

Used to indicate whether a child element, 
existence of the child is optional, (i.e. 

otherwise . . 

mue e^^h^T"" n-^'^T oftixe child element. 



[001 171 Fields FxomiD and toID reference a parent item and a child item that 
participate in a nesting relationship. The index field captures the schema ordering 
property; it denotes the position of this child item in the parent item's definition. In a 
sequence group, each child item will have a different value for indices (i.e., 1,2,.. .); for all 
children in a choice group, the index fields will all be the same (i.e., 0). 
[00118] The occurrence property for a child element is captured by a combination 
of the Ratio and optional fields. The Ratio field shows cardinality between the 
instances of the parent item and of the child item. Note that, since the nesting 
relationships are always from one element type to its sub-elements in the DTD 1 8. there 
are only one-to-one or one-to-many nesting relationships in the Ratio column. The 
20 many-to-one and many-to-many relationships are not captured by the Ratio field but 

rather are captured by id/idref (s) attributes in the DTD. The optional field has value 
of true or false depending on whether this relationship is defined as optional in the 
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DTD. The following table shows how the Ratio and optional fields combine to 
represent the occurrence properties: 



Occnrrenese Prope 


rty Indies. 
Ratio 


itors 
Optional 


Occoxzenoe Property 
Mo Indicator 


1:1 


false 


? 


1:1 




+ 


l:n 


false 




l:n 


true 



[00119] The steps for ejctracting the metadata tables 90, 92 and 94 can be 

5 summarized as follows: 

[00120] Create one pcdata Item. This one item will represent all occurrences of 
#PC3>ATIL in the DTD. The pcdata item will be used to convert aU element-to-pcDATA 
relationships, such as found in a mixed content definition, to element-to-element 
relationships, thus unifying these two types of nesting relationships. A pcdata item has 
1 0 one attribute called value that is used to capture the text value of this pcdata item. 
[00121] Create an item for each element type declaration. Tuples in the 
DTDM-Item table 90 are the elements directly defined by the DTD 18. 
[00122] Create an item for each grouping relationship in each element type 
declaration. For each element type declaration in a DTD 18, a group item is created for 

1 5 each group in the item, and the group in the elemem type declaration is replaced with the 
corresponding group item. In Example 1, items 14 to 16 represent the groups (author* | 

editor), (author, affiliation?), and (book | monograph) respectively. Defining 

each group as an item is used to convert nested groups into nesting relationships between 

items. 

20 [00123] For example, the definition of element book shows that book is composed 
of booktitle, and a group of authors or an editor. A new item gi would thereby be 
defined for the group (author* I editor), the element definition of book would be 

changed to <!ELEa!aENT book (booktitle, Gl)>. 

[00124] Store nesting relationships. After defining the pcdata item and all 
25 group items, the hierarchical definitions of elements can be described as nesting 

relationships between two items. An element definition is a sequence (or choice) ofn sub- 
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elements stored as n nesting relationships with index fields in the DTDM-Nesting table 



94. 



20 



100125) For example, the element definition < • EuaffiNT book (bookti tie , gi ) > 

has two nesting relationships, i.e.. between items book and booktitxe, and between items 
book andGi. These nesting relationships are shown in the DTOM-Nesting table 94 
constructed in accordance with the DTD 18 of Example 1 (with ids 1 and 2). 
[00126] Store attributes. For items with attributes defined, those attributes are 

stored in the DTDM-Attribute table 92. For example; the attribute AutboriDs of item 
cont:actautbors is Stored in DTDM-Attribute table 92 (with id 1) constructed in 
accordance with Example 1 . 

[00127J Store the miy element type definitions. An MW^typed element can 
contain a mix of pcdata and any defined element types, and thus can have relationships 
with all other element types To capture that relationship, a choice group item is created 
called M« GROUP (AG). Every element type definition with content m« expresses its 
relationships with all other elemem types with a one-to-many nesting relationship with 
the AG item. Using Example 1 , row 1 7 in the DTDM-ltem table 90 is the ag group, and 
nesting relationships with id 24 through 36 are between this ag group and each of the 
other element items and the pcdata item, i.e., between this ag group and items 1 through 
13, respecitvely, in the example described herein conforming to the DTD 18 of Example 
1 . The Affiliation item, which is an an^ element type declaration, has a one-to-many 
nesting relationship to the ag group (see line 23 in the DTDM-Nesting table 94 below). 
[00128] Once the metadata has been extracted and mapped fi-om the DTD 1 8 

shown in Example 1 in the Background section, the metadata tables 34 have the following 
structure. 
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[00129] The DTDM-ltem table 90: 



id"! 


»»» 1 


Type 1 




1 


PCDA.TA. 


PCDATA. 


2 






3 


book-title 




4 


article 


KT.¥3ffi»IT . EIiEMENT 


5 


title 




6 


contactauthocs 


SLEMENT . EMPT^ir 


7 


moao^raph 


ELEMENT . EIJBMEI3T 


e 


editor 


ELEaiENT . ELEMENT 


9 


author 


EliEMENT . ELEMENT 


10 


zxane 


ELEMENT . ELEMENT 


11 


firstnasie 


ELEMENT. MIX 


12 


lastnams 


ELEMENT. MIX 


13 


affiliation 


EI4EMENT.ANY 


14 


Gl 


GROXIP 


15 


S2 


GROUP 


16 


63 


(310UP 


17 


AG 


GROUP 



[00130] The DTDM-Attribute table 92. 

I ID I PIP I Name I Type \ Default | 



1 


6 


author IDs 


IDKEFS 


#REQUIKED 


2 


e 


name 


CDATA 


#BKOtnBED 


3 




id 


ID 


«BEQUIKBD 


4 




value 


PCDATA 


«BEQUIBED 
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[00131] The DTDM-Nesting table 94: 

I ID I FromlD I ToID I Ra-tio I Opti-onal 1 Index | 



[00132] Any discussion based on examples (and, specifically. Example 1) refer to 

the above metadata table examples. 

[00133] The mles used for mapping the DTD 1 8 are described (as stored as 

described above), into the relational schema 22 are described in the following. The basic 
idea behind these mles is that each item is to be mapped into a relational table 20 in the 
database 14 according to generated schema 22. 
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[00134] As discussed previously, the elements in XML documents are ordered. 
This order is retained in the newly-generated relational table schema. Groups are not 
directly shown in the XML document, so they do not have an order property. 

Figure 6 

[001351 The step of creating tables 20 in the relational database 14 (identified by 

reference number 52 in Fig. 2) wiU now be described in greater detail with respect to Fig. 
6. Turning to Fig. 6, processing moves to step 120 in which a query of the DTDM-ltem 
table 90 is performed to return all of the item types stored in the DTDM-ltem 90 table. 
A proposed SQL statement to accomplish this task is shown in note 122 associated with 
step 120 in Fig. 6. Processing then moves to step 124, which initiates a loop for every 
item returned in the recordset selected in step 120. Processmg within the loop then 
moves to step 126 wherein a table 20 is created in the relational database 14 with some 
key-type defauh fields, wherein the table name created in the database 14 corresponds to 
the Name field in the DTDM-ltem table 90 as returned in the recordset in step 120. A 
proposed SQL statement to accomplish this task is shown in note 128 associated with 
step 126 in Fig. 6. Processing then moves to decision block 130. which determines 
whether additional items exist for processing. If so, processing returns to step 124. If 
not, processing ends. 

[00136] These steps perform a first mapping on the DTDM-ltem table 90. That 

is, for each ei^meht.* and PCDATA-typed item defined in the DTDM-ltem table 90, a 
table is created with two default columns: "iid" and "order". For each GROtJP-typed 
item, a table is created with only an "iid" column. 

[00137] The metadata tables 34 are queried to get all non-group items by: 



SELECT name, type FROM DTDM-ltem WHERE type 
"PCDATA" 



LIKE "ELEMENT. %" OR type = 



[00138] After the name and type of the item are retrieved, the following queries are 

performed to create the tables from the query recordset returned. So, for the type of 
"Ei^MENT.*" and "PCDATA" item in the query recordset result, the foUowing query is 
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issued to create a table for each of them. (e.g.. an item caUed itemTabie with a default 
primary key "iid" and a column called "order" in an integer format): 

CREATE TABLE ItemTable (iid INTEGE31, order INTEGER, PRIMARY KEY iid) 

[00139] For items of other types, a table will be created in the form such as: 

CREATE TABUE ItemTable (iid INTEGER, PRIMftRY KEY iid) 

[00140] After identifying the basic tables and their required columns, any other 
columns that are appropriate for those tables must be determined. First, columns are 
added, which represent the attributes of its parent item, to each table corresponding to 
that item. Second, the columns of the various tables are interconnected according to the 
detected nesting relationships therebetween. 

Figure? 

[00141] The step of adding columns in the tables created in step 5 2 for attributes in 
the DTDM-Attribute table 92 (as identified by reference number 54 in Fig. 2) will now be 
described in greater detail with respect to Fig. 7. As shown in Fig. 7, processing moves to 
step 1 32 in which a join query of the DTDM-Attribute table 92 with the DTDM-ltem 
table 90 is performed to return all of the attributes of an item from data contained in the 
DTDM-Attribute table 92 and the DTDM-ltem tables 90. A proposed SQL statemem 
to accomplish this task is shown in note 134 associated with step 132 in Fig. 7. 
[00142] Processing then moves to step 136, which initiates a loop for every 
Attribute returned in the recordset selected in step 132. Processing then moves to step 
138 in which, based upon the attribute type of the particular row in the recordset 
returned in step 132. a column-type variable is determined. A list of applicable column 
types is shown in note 140 associated with step 138 in Fig. 7. 

[00143] Processing then moves to step 142 in which the relational database schema 
22 is altered to add this attribute and its type to its parent table. A proposed SQL 
statement to accompUsh this task is shown in note 144 associated with step 142 in Fig. 7. 
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Processing then moves to decision block 146 to determine whether additional attributes 
need to be processed. If so, processing returns to step 136. If not, processing ends. 
[00144] In this part of the inventive method herein, these steps perform a second 
mapping on the DTDM- Attribute table 92. For each tuple in the DTDM-Attribute table 
92, a column is created, named with the 'TSlame" property of the tuple m the relational 
table that is identified by thepid index of the tuple. All of the column domains are 
preferably strings, since the DTD 1 8 preferably only has one data type, i.e., cda.ta, 
PCDATA. In this invention, it is not necessary to perform additional parsing on the data 
values to determine their data types, although doing so would not depart from the scope 
of this invention. 

[00145] More generally and by way of summary, the above described steps 
Ulustrate that the metadata tables 34 are queried to get all attributes for a given item name 
X (see step 132 and its associated note 134): 

SEiaiCT A.name, A. type DTDM-Item I, DTDM-Attribute A WHEPE I .name = 

X AND I .id = A.pid 

100146] Then, those attributes returned in the above recordset are placed in the 
definition of the tables created in the first mapping by issuing the following queries (see 
step 142 and its associated note 144): 

ALTER TKSLE ItemTable ADD (Ai.naioe Ai.type, As. name Aa-type,...) 

[00147] Here, the A,, Aj, etc. names and types are the tuples selected from the 

query issued in connection with step 132 (as described in note 134). 



Figure 8 

[00148] The step of determining and adding nesting relationships to the relational 

schema 22 (identified by reference number 56 in Fig. 2) wiU now be described in greater 
detail with respect to Fig. 8. Turning to Fig. 8, processing moves to step 148 in which a 
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queiy of the DTDM-Nesting table 94 is perfonned to return all of the nesting 
relationships of an item from the data contained in the DTDM-Nesting table 94 and the 
DTDM-Item table 90 (i.e.. to extract all of the item IDs involved in each nesting 
relationship). A proposed SQL statement to accomplish this task is shown in note 1 50 
5 associated with step 148 in Eig. 8. 

[00149] Processing then moves to step 1 52 which initiates a loop for every nesting 
relationship returned in the recordset selected in step 148. Processing then moves to 
decision block 154 which determines whether the ratio of elements in the particular 
nesting relationship is a l-to-l or a 1 to n relationship. If the decision block 154 
10 determines that the ratio is l-to-l, processing moves to step 156 in which a foreign key is 
inserted in the parent table of the relationship. A proposed SQL statement to accomplish 
this task is shown in note 158 associated with step 156 in Fig. 8. If the decision block 
154 determines that the ratio is 1 to n, processing moves to step 160 in which a foreign 
key is inserted into the child table in the relationship. A proposed SQL statement to 
1 5 accomplish this task is shown in note 1 62 associated with step 160 in Fig. 8. In either 
case, processing then moves to re-connector 164 and then to decision block 166 which 
determines whether additional nesting relationships need to be processed. If so, 
processing returns to step 152. If not, processing ends. 

[00150] ■ Again, in a general summary, a third mapping is thereby performed on the 
20 metadata tables 34 in coraiection with the process shown in Fig. 8. For each tuple r in the 
DTDM-Mesting table 94, the table corresponding to the from item is labeled as s and the 
table corresponding to the to item as t participating in r. Then if r is a one-to-one nesting 
relationship, the iid of t is stored as a foreign key in s (i.e.. store T_iid as a column m 
s), if R is a one-to-many nesting relationship, the iid of the item s is stored as a foreign 
25 key, named p_t_ iid (p means parent, so it can be thought of as a reverse link), in t. 

[00151] If the Optional field of this relationship R in the DTDM-Nesting table 94 

is false, then a NOT nui^ constraint is added on the defmition of the table. 

[00152] If there is more than one relationship between the two items s arid t. then 

indices are placed after each column name, e.g., _T_iid_i. 
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[00153] The metadata tables 34 are queried to get pairs of items of all of the nesting 
relationships (see, e.g., note 150 associated vnth step 148 in Fig. 8): 



SELECT F.name, T.naine FROM DTDM-ltem T. DTDM-Item T, DTDM-Nesting 
5 F.id^N.fromid AND T.id = M.'boi.d. 

[00154] In accordance with the invention, this relationship mapping depends upon 
the ratio inherent to the nesting relationship, so that the corresponding table is updated in 
the following maimer: 
1 0 [00155] If a one-to-one relationship is detected at decision block 1 54 : 



TABLE FcomT'tem ADD (ToZ-tein_iid) 

[00156] If a one-to-many relationship is detected at decision block 154: 

AliTER TABLE Toltem ADD (pax«iit_Fraii»Item_iid) 

[00157] In this approach, many-to-many relationships between different elements 

are captured by joins on attributes of type id and idref(s). Different combinations of 
the id/idref(s) attribute represent different cardinalities between two elements. The 
following table represents the possible relationships between two elements depending 
upon their id/idkef(s) attributes (e.g., of note, the eontactauthors and author 
elements in Example 1 listed in the Background section have a many-to-many 
relationship): 



x:y 


X 


ID 


IDREF 


IDREFS* 




ID 


n/a 


n: 1 




y 


IDREF 


l:n 


n/a 


n/a 




IDBEFS* 




n/a 


n/a 



*Ox multiple IDREF type of attributes 

[00158] A fourth mapping is performed on the metadata tables 34 for the id type 

of attributes. For the id type of attributes, those attributes are designated as a key of 
30 their parent tables. 
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[00159] A fifth mapping is perfomed on the metadata tables 34 for the iD»Er(s) 
type of attributes. For the idm:f(s) type of attributes, join constraints will be added to 
show meaningfiil joins between those tables. Each combination of attributes with type id 
and type iDiffiF(s) is stored as one tuple in a TS-JC table 1 02 (short for Table 
Schema/Join Constraints). The T=o«coi««n and Tocoi«nn store the id of the attribute of 
type iDKEF(s) and of type id, respectively. 

[00160] The TS-JC table 1 02 stores these equi-join conditions representing the 
element reference properties: 



Internal ID of this join eontiition. 



FromColunm 



Colxann Join frcam. 



Times of « 



[00161] The metadata tables 34 are queried to retrieve all of the attributes having 
type ID by the query: 

SELECT I. name, A.name FROM DTDM-ltem I, DTDM-Attritoute A WHERE I. id = 
A.pid AMD A. type = "ID" 



[00162] 



1 of the attributes having type idref (S) are retrieved by the query: 



20 



SEIiiCT I. name, A. name FROM DTDM-ltem I, DTDM-AttriBute A WHERE I . id = 
A.pid AND (A. type = "IDREF" OR A. type = "IDREFS") 

[001 63] Then, those retrieved recordset items are placed into the TS-JC table 1 02. 
The result after performing these mappings on the metadata tables 34 (specifically as 
updated per Example 1), is shown in the following tables which show essentially a data 
dictionary of the relational schema 22 for the relational database 14 and also the TS-JC 
table 102. The data diaionary lists the table names and the columns in each table. 
[00164] The relational database data dictionary would look as foUows if the 
metadata for Example 1 is employed: 



Table Mame 



Required Columns 



Data Columns 



Relationship Columns 
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PCDATA 

book title 
article 

title 


iid, order 1 
iid, order 

iid, order 

iid, order 


Value 
PCDATA iid 

PCDATA lid 


booktitle iid. Gl xid 

title_iid, 
eontactauthor iid 


monograph 

editor 


iid, order 

iid , order 

iid, order 


au-thorsXDs 


■title iid, author_iid., 
editor iid 


author 
name 

firstname 


iid, order 

iid, order 

iid, order 


id 

PCDATA iid 
PCDATA iid 


■parent SI iid, name iid 
firs ■tnanie_iid , 
lastname iid 


lastname 

affiliation 

Gl 

62 


iid, order 

iid 

iid 




editor iid 
parent article_iid, 
aia"thor_iid , 
affiliation iid 


63 


iid 




parent editor_ixd, 
book iid, monogiraph iid 


AG 


iid 




Parent af f iliati.on_iid , 
PCDATA~*iid, boolc_iid, 
book title iid, 
axtiele_iid, title_iid, 
eontactauthors_iid , 
inonograph_iid , 
editor_iid, autlior_iid, 
naine_iid, 
f iirs tname_iid , 
las tnaine_iid , 
affiliation iid 



[00165] 



The TS-JC table 102 would appear as follows: 



XD 


rxomColumn 


ToColumn 






author . id 


eontaetauthors . au^bhorlDs 





[00166] From the examples shown above, it can be seen that there are several 

relationships between different tables and also, the idpefs type attribute of the DTD 1 8, 
e.g., authoriDs, is not fully represented by the schema 22 proposed above. 
[00167] Two restructuring techniques (i.e., as part of the optional optimizer 26) 

can be employed according to the invention on the metadata tables 34 to address these 
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shortcomings. First, multiple-value attributes can be identified. Second, elements that 
should be attributes of other elements can be identified. 

[00168] In the DTD 1 8 (and, of course, any DTD), some attribute types can 
contain multiple values, e.g. idbefs, KwrcKENs, entities. Such attributes can be 
5 analogized to a set rather than an attribute, and it is desired to represent their values as 
sets. Instead of treating the whole attribute as a unitary string of some undetermined 
length, the values of that attribute are accessed individually. Hence, it is desired to 
identify these multiple- value types of attributes and convert them into separate tables to 
access those values. Of course, the transformation of attribute types other than idbefs 
10 would be handled in a similar manner. 

[00169] By way of example, assuming that a DTDM-Item E has a multiple-value 
attribute K For each A, another item named e:a is created. There is only one attribute in 
this item named value . A one-to-many relationship is then created between the item e 
and the item e:a. The attribute a is then removed from the attribute list of the item e. The 
15 following paragraphs describe this type of mapping in detail. 

[00170] It can be seen from the above metadata expression of Example 1 that the 
attribute authoriDs of item contactauthors is of type IDBEFS. So, a new item 
contaetautiiors_auti.o3riDs is created and the attribute authoriDs with type idbefs is 
changed into the attribute value of item contactauthors_authoriDs with type idref. 
20 This allows the expression of multiple values of attribute authoriDs by a new table. 

[00171] In a DTD, there are one-to-one relationships between different elements. If 
an element of type A contains only an empty content element, then a later element of 
type B can be considered to act as a complex attribute of an earlier one. Hence, a 
technique is proposed to convert these kinds of "complex attribute"-elements into a real 
25 attribute which is referred to herein as an inline attribute process. 

[00172] Mining an attribute means, if item a to item b has a 1 : 1 nesting 

relationship and item b has no child item, then all of the attributes of item b can be inlined 
into item a. Then, the attribute x of item B is inlined into a as b_x. This inline technique 
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cannot beappUed to one-to-many relationships, because multiple occurrences of the 
attributes could exist. 

[00173] After inlining the attributes by the process discussed above, the number of 
nesting relationships is consequently reduced, hence the table schema 22 is simpler. It can 
be appreciated that a group item could also contain other items, so, after inlirang 
attributes, the group item could have attributes which are converted from its chUdren 
items. 

[00174] The inlining process includes multiple iterations, until no additional items 
can be inlined. In terms of operations on the metadata tables 34, starting from the pcdata 
element (leaves), items in the DTDM-Nesting table 94 are searched for that have never 
appeared in the "Txamxn" field, and only appeared in "toid" field with "Ratio" as "i : i". 
[00175] In order to apply the inline operation, the item b has to have no child 
items, and the relationship between item a and item b must be one-to-one, as follows: 



' TD 

FROM Nestling M 
VBEBM. ToXD IK 

(SELECT UNIQXJE ToID as PID , 

FROM Nesting 

EXCEPT 

SELECT UNIQUE PromID as PID 
FROM Nesting) AND 
MOT EXIST 
(SELECT * 
FROM Nesting 

1IHERE ToID = N.ToID ikND Ratio = "l:n") ; 

[00176] For example, using the metadata tables generated in response to Example 1, 
there are two proposed iterations. During the first iteration, only item i (pcdata) is 
returned. During the second iteration, item 3 (booktitie), 5 (title), ii (f irstname) 
and 12 (lastname) are returned. Then, there no more items satisfy the above conditions. 
[001 77] After inlining the attributes, the gboxip, or attribute typed items can be 
removed which, of course, have no relationship with other items, from the DTDM-Item 
table 90. However, the items of type element. * or pcdata cannot be removed, because 
the elements and pcdata are required during the database data loading phase discussed 
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hereafter in detail. After loading the data from the document 12, those tables can be 
removed to reduce the degree of redundancy of data. 

[00178] There is an additional refinement to make the schema 22 more meaningfiil. 
That is, in the table schema 22. all of the attribute names of "* .PCDAia.. value" are 
simplified into Then, a query can be performed in simpler semantic Uke: 

SKIECT boek-tl-ble FROM book 

rather than 

SELECT book-title . PCDATAjvalue TBCOA book 

[GDI 79] The tables following this paragraph show the metadata tables 34 after 
being restmctured as discussed above. Compared to the initial version of the metadata 
tables 34 above, it can be seen that, the number of nesting relationships in the DTDM- 
Nesting table 94) has been reduced to half; while the number of attributes has been 
increased. This results in bigger tables and less joins across the tabT«,'which is beneficial 
for join query performance, and is easier to understand. 



[00180] The restructured DTDM-Item Table 90 (also referred to herein as the IM- 



Item table 96): 



ID 




Type 


1 


PCDATA. 


PCDATA 


2 


book 


ELEMEaiT . ELEMEalT 


3 


booktitle 


ELEMENT. MIX 


4 


article 


ELEMENT . ELEMENT 


5 


title 


ELEMENT. MIX 


6 


contactauthors 


ELEMENT . EMPTY 


7 


monograph 


ELEMENT . ELEMENT 


8 


editor 


EUSIENT . ELEMENT 


9 


author 


ET.EME3IT . ELEMENT 


10 




ELEMENT . ELEMENT 


11 


f irstname 


ELEME39T.MIX 


12 


lastnaxne 


ELEMENT. MIX 
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a££iliat.i.on 



ATTRIBUTE 



[00181] The restructured DTDM- Attribute table 92 (also referred to herein as the 

IM- Attribute table 98): 



ID 


PID 


Name 


Type 


Default 












1 


18 


value 


IDBEF 


#B£QUIB£D 


2 


8 


name 


CDATA 


#PECtJIBED 


3 


9 


id 


ID 


#REQUIBED 


4 


1 


value 


PCDATA 


#REQ0IHED 


5 


3 


PCDATA va.lue 


PCDATA 


#HEQUXR£1D 


6 


5 


PCDATA value 


PCDATA 


#REQUIRED 


7 


11 


PCDATA value 


PCDATA 


#REQUIRED 


8 


12 


PCDATA value 


PCDATA 


#REQUIHED 


9 


17 


PCDATA value 


PCDATA 


#R£QUIRED 


10 


2 


boolctitle 


PCDATA 


#KEQUIRED 


11 


17 


booktitle 


PCDATA 


«BEQt7IBED 


12 


4 


ti-tle 


PCDATA 


#BEQT7IBED 


13 


7 


titae 


PCDATA 


#BEQniKEa} 


14 


17 


•title 


PCDATA 


#BEQaiBED 


15 


10 


fiars'tnaxae 


PCDATA 


#BEQT7IB£:D 


16 


17 


fiss'tnaae 


PCDATA 


#BEQUIBED 


17 


10 


lastoame 


PCDATA 


#be:quibed 


18 


17 


lastname 


PCDATA 


tBEQTTIPKD 


19 


9 


name firs-ttiame 


PCDATA 


#REQUIKED 


20 


9 


name Xastname 


PCDATA 


«REQUIKED 


21 


17 


name £i£stnaiiie 


PCDATA 


«REQUIB£D 


22 


17 


name las'tname 


PCDATA 


#KEQT7IKED 



[001 82] The restructured DTDM-Nesting table 94 (also referred to herein as the 

IM-Nesting table 100): 



FromID ToID 
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false 
false 



5 



Id 



[00183] The foUowing data diaionary representative of the table 



thereby generated in accordance with this inventive method: 



Table Name 

PCDATA 

book title 
■title 

eon-tacta-u^or 

editor 
author 


Required Columns 

iid, order 
iid, order 
iid , order 
iid, order 

iid, order 

iid, order 

iid , order 
iid, order 


Data Columns 
value 

booktitle 

PCDATA value 
title 

PCDATA -value 

title 
id, 

name f irstname , 
name lastname 


Relationship Columns 
Gl iid 

contaetauthor iid 

author iid, editor iid 
parent_Gl_iid 


name 

f irstnaxne 
affiliation 


iid , order 

iid , order 
iid , order 
iid , order 


f irstname , 
lastname 
PCDATA value 
PCDATA value 
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61 
62 


lid 
lid 




editor iid 
parent_article_ild , 
author_iid, 
affiliation iid 


63 


lid 




parent-editox_iid , 
book iid, moxiograph iid 


AG 


lid 


PCTDATAjvalue , 

booktl-tJe, 

title, 

flEstnane , 

la.stnaine , 

n^mi=. flrstname, 

name-las tname 


paxent_a£f ila.at3.on_axd , 
book_iid, article_iid, 
coiitactauthoxs_iid , 
monograph_ii d , 
editor_iid, author_iid, 
af£iliation_iid 


. anthorlDs 


idd 


value 


PMlENT_eontaetauthoxs_id 
d 



[00184] 



The TS-JC table 102 follows: 



i ToColumn 
eontaetartthogg 



I Ocgurrenee 



[00185] Now, the details of this inventive metadata-driven method that loads an 
XML document 12 into the relational schema 22 generated above will be discussed. 
[00186] The loading process has two general phases as depicted in Fig. 1 . First, the 
pattern-mapping table is generated that is used to capture the mapping between a DTD 
1 8 and the relational schema 22. This includes generating an initial pattern-mapping table 
36, and updating this pattern-mapping table to keep track of the actions during the 
restructuring of metadata tables 34. Second. XML documents 12 are loaded that comply 
with the DTD 18 using the pattern-mapping table 36. 

[00187] An example of a compliant data portion 16 of the XML document 12 is 

shown below: 

[00188] Example 2. A valid XML Document complying with the DTD of 
Example 1. 



<!xiiil version="1.0"> 

<!DOCTYPE article SYSTEM "book.dtd"> 
<article> . 

<title>XlIL Relation Mapping</title> 

<author id = "3tz> 



<firstnaine>Xin</firstnaine> 
<lastnaine>Ztiang</lastnaine> 
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</au'thor> 

<affili.ation> 
Department of Cearputesr Science 
Worcester Polytechnic Institute 
Worchester, Mft. 01609-2280 

</affiliation> 

<author id = "gm"> 



<firstnaiBe>Gail</firstaanie> 
]0 <lastnaine>Mitchell</lastnaine> 
</nanie> 
</author> 
<affiliation> 
Verizon Laboratories Incorporated 
15 40 Sylvan Rd. 

Waltham, 02451 
</affiliation> 
<autlior id = ■■wl"> 



20 <firstnaine>Maaig-cliien</firstnanie> 
<lastnaine>Iiee</lastnaine> 

</author> 
<affiliation> 
25 Verizon Laboratories Incorporated 

40 Sylvan Hd. 
Waltham, lOi 02451 
</affiliation> 
<contae5tauthors authorlDs="xz gm wl"> 
30 </article> 

FifillRES 12-14 

[001891 With reference to the drawings and, specificaUy, to Figs. 1 2-1 4, an XML 
tree model 300 is proposed herein, followed by the introduction of the concept of 
patterns detected during the importation of the data, the definition of the loading actions, 
35 the description of the generation of a pattern-mapping table, the provision of details of 
the loading algorithm, and finally by a loading example. 

[00190] By way of explanation, it is known to parse XML documents into a 
documem object model (DOM)-complaint XML tree structure. Here, a simplified DOM 
model is proposed for illustration purposes. This model (of the type shown in Fig. 14) is 
40 composed of nodes 302 and edges 304. Every node 302 of the tree corresponds to one 
element in the XML document 12. An edge 304 corresponds to a nesting relationship 
between elements. The relationships between elements with id/xdref typed attributes 
are not shown in this model. 

[00191] Every node 302 preferably has a type and possibly one or more attributes. 
45 Every attribute preferably has a name and its corresponding value. The pcdata node has 
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only one attribute, named "value". Fig. 14 depias the XML document 12 defined in 
Example 2 in terms of this tree model 300. For internal nodes 302, the element type is 
written above the node followed by any attributes and their values. All leaf nodes 302 
have type pcbata, and have their values in the "value" attribute below the node 302. 
5 Arcs between nodes 302 illustrate nesting relationships between the nodes 302. In 

addition, every node 302 has its document order on it (contained within it (i.e., the order 
by which a tree traversal routine encounters a particular node 302). 
[00192] In the model of Fig. 14, other known types of nodes, e.g., notation, 
comments, etc., as considered in the well known DOM model have not been included, 
10 because this invention focuses primarily on the elements and their attributes. 

100193] When the XML tree model of Fig. 14 is examined, it can be observed that 
there are three kinds of objects in the tree - nodes, links, and attributes. Different types 
of nodes, links and attributes are stored m different parts of the relational tables 20 
generated for the relational database 14 in accordance with the previously generated 
1 5 relational schema 22. A pattern associated therewith is preferably the type of node, link 
■ and/or attribute and are referred to herein as a node pattern, a Unk pattern, and an attribute 
pattern. 

[00194] The node pattern is identical to one item, of whose type is ELEMEarr . * or 
PCDATA, in the DTDM-Item table 90. It is represented as its item name in the pattem- 

20 mapping table 36. The attribute pattern is identical to one attribute in the DTDM- 

Attribute table 92 and is represented as its fiill attribute name (including its associated 
item name) in the pattem-mapping table 36. The Unk pattern is used to show aU possible 
links between two types of elements that are permitted by the DTD 18 and is 
represented in the tables below as two element types with an arrow ("-^") in the middle. 

25 [00195] There are two types of nesting relationships captured in the DTDM- 

Nesting table 94, i.e., the test for determining such is whetiier the relationship involves a 
group typed item or not. If a nesting relationship does not involve a group-typed item, 
then it can be directly mapped into a link between the two items participating in that 
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relationship. For a nesting relationship that involves group typed items, the following 
steps are proposed to generate the link patterns: 

[00196] A temporary table link is created by doing a self-join of the DTDM- 
Nesting table 94 on the group typed items: 



CBEA.ZE Table link AS 
SEXBCT A. FaromTD , B. ToID 

FROM DTDM-Nesting A, DTDM-Nesting B, DTDM-Item C 
.KHEHE A. ToID = B. PromlD AND A. ToID = C.ID and C.-type 



[00197] For example, nesting relationship pairs (2,3), (2 , 4) for GroupiD=i4 can 
be determined from the metadata tables 34 according to the example described previously. 
[00198] Further self-joins are performed on the table link until all the PromiD and 
ToID indices are not group typed items. It is contemplated that as many as n - 1 
iterations may need to occur until all possible self-joins are performed (for a maximum of 
n level groups). 

[00199] Accordingly, the links in are located in this generated link table are the 
remainder of the link patterns needed. 

[00200] The following tables provide the pattern-mapping table 36 wth all of the 
patterns from the metadata tables 34 generated according to the DTD 1 8 provided in 
Example 1 and discussed throughout. By way of clarification, diflferent types of patterns 
have been separated by a solid line in the following table in the following order: (1) node 
patterns; (2) attribute patterns; and (3) link patterns. 



Pa-ttern 

PCDATA 

book title 

article 

title 

contaetauthors 

monograph 

editor 

firstnane 

lastziame 

affiliation 
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eontaetauthors . authorISs 
edi. 'box. name 
au-Uior . id 
PCDATA . value 
bo ole-»book ti tie 
book— ►author _ 
book— 'editor _ 
bookti tie— PCDATA 
ax'ticle-» title 
aztisle— »authox 
article—affiliation 
aarbiBle-^eontaotautbors 
title—PCDATA 
monograph— title 
monocfraph— author 
monograph— editor 
editor— book 
editor-^monograph 
author— aame 
name— firstname 

f irs tname— PCDATA 
1 as tname— PCDATA 
affiliation— PCDATA 
affiliation— book 
affiliation— book title 
affiliation— artieae 
affiliation— title 
affiliation— eontaetauthors 
affiliation— monograph 
affiliation— editor 
affiliation— author 
affiliation— name 
affiliation— firs tname 
aif filiation— las tname 
affiliation— affiliation 



[00201] Patterns are used to identify tiie type of nodes, links and attributes. Then, 

the inventive process contemplates the definition of loading actions as will be described 
below. These loading actions describe how to fill the data in the pattern mapping tables 
into relational tables 20 for the relational database 14. These loading actions are 

summarized below as create, update, assign and deconpose. 

[00202] create (T) creates a tuple in table t with a new "iid" and an optional 
"order" if handling an element. 

[00203] update (T:A) updates a column a of the current tuple in table t with the 
value of the attribute in the XML tree (like the one shown in Fig. 14). 
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[00204] assign (T:A; s :B) assigns a coluiim A in uble T with another attribute B 
in table S. 

[00205] decoi.5,oBe (T: A) decomposes multiple tuples for a multi-value attribute 
and stores those values into column a in table r. For example, if a multi-value attribute 
has value Vi v2", then it will create two tuples with value "vi" and "v2" respectively. 
[00206] The possible mappings between the pattern detected and the loading 
actions are described in the following paragraphs. 

[00207] When a node is encountered, one new tuple is created in the corresponding 
table. 

[00208] When an attribute is encountered, two possible cases result. First, this 
attribute can be mapped into a column, e.g., update (t. A) , and then the column of the 
tuple in the specific table is updated. Second, this attribute can be mapped into a table, 
and then multiple tuples in the specific table can be created for each value in this attribute. 
[00209] When a link is encountered, three possible cases result. First, the foreign 
key in one table can be updated with the key value in another table. Second, if there is a 
group in this link, then a new tuple is created in the group table as well as the - 
corresponding foreign keys are updated. Third, if the child node is inlined in the parent 
node, then all of the attributes of the child table are copied into the parent table. The 
details of how to generate those actions are discussed below^ 



Node : T 
Attribute : T . A 



Link : A-»B 



(T) 

"de^o^ostW.value) , assign (T_A. parent. T_iid, T.iid)) 

^"iSit;OK'a:;tj^<A.a_iid. G.iid), assignJC.^id B.iiJ 
"(create (G), assign (A. G_iid, G.iid), assxgn (B.parent_G_ia.d, 

°i«?ite(G), assign (G.parent_A_iid, A.iid), assign (G.B_iici, 

B.iid) . , ■ ■ j» 
_(create(G) , assign (G .parent_A_a.id, A.a-xaj , 
assign (B.parent_G_iid, G.iid) 
assign (A -attribute, B. attribute) 



20 



[00210] As described with respect to Fig. 14, an XML document 12 in.the DOM 
model 30O consists of nodes 302 and links 304, but it is desired to load the XML data 16 
into the relational tables 20 according to the generated schema 22 of the relational database 
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14. Therefore, mappings are needed between the XML document 12 and the relational 
tables 20 of the relational database 14. 

[00211] Beneficially, those needed mappings are captured in the pattem-naapping 
table 36, which capture the mapping between patterns to the loading actions. Because 
5 some of the actions create tuples, and some actions use those tuples, the order in the 
mapping field of the pattern-mapping table 3 6 is very important. 

Trir.TmEs9-ll 

[002121 The pattern-mapping table 36 is generated right after the metadata is 
loaded, and modified during the restructuring. The generation of the pattern-mapping 
10 table 36 during each step of the schema-generation process will now be discussed as 

indicated by the step of initializing the pattern mapping table 36 identified by reference 
number 58 in Fig. 2 with reference to Figs. 9-11. 

[00213] Turning to Fig. 9, the process of initializing the pattern mapping table as 
identified in Fig. 2 by reference number 58 can be described by two general steps: first, 
15 initializing the link table as shown by reference number 168 and then initializing the 
pattern mapping table as shown by reference number 170. 

[00214] Turning to Fig. 10, the step of initializing the link table identified by 
reference number 168 in Fig. 9 is described in fiirther detail. Processing moves to step 
1 72 in which a first link table is created (e.g., the initial link table is referred to as Linko). 
20 A proposed SQL statement to accomplish this task is shown in note 1 74 associated with 
step 172 in Fig. 10. 

[00215] Processing then moves to step 1 76 and which a variable to control the 
number of iterations is initialized, namely, ii:e«tion_n«nber equals zero. Processing 
then moves to step 178 in which the number of groups from the link table, i.e., the 
25 recordset returned in step 172, is retrieved. A proposed SQL statement to accomplish 
this task is shown in note 180 associated with step 178 in Fig. 10. 
[00216] Processing then moves to decision block 1 82 in which it is determined 

whether additional groups (beyond the first) need to be processed. If not, processing 
ends. If so, processing moves to step 1 84 in which the iteration control number 
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itexation_nHmber is incremented. Processing then moves to step 186 in which 
additional link tables are created (i.e.. Linki. i^nic2, ia.nic3..,) for each of the additional 
iterations. A proposed SQL statement to accomplish this task is shown in note 188 
associated with step 186 in Fig. 10. 
5 [0021 71 Processing then moves to step 190 in which the number of groups is 
selected from the next newly-created link table. A proposed SQL statement to 
accomplish this task is shown in note 192 associated with step 190 in Fig. 10. Processing 
then moves to decision block 194 to determine whether additional groups need to be 
processed. If so. processing returns to step 184. If not, processing moves to step 196 in 
10 which the most recently-created link table is loaded into the link pattern table which is 
reflected in the process steps of Fig. 1 1. After step 196, processing ends. 
[002181 Turning to Fig. 1 1, processing moves to step 198 in which all of the 
Ei^MEKT and PCDATi. items from the DTOM-ltem tables 90 are returned in the recordset. 
Processing then moves to step 200 which initiates a loop for each item returned in the 
1 5 recordset generated in step 198. Processing then moves to step 202 in which one tuple 
corresponding to these items is created in a CreateAction table (see Fig. IB for the 
interrelation of the CreateAction table with the rest of the pattern mapping tables 36). 
Processing then moves to step 204 in which mapping corresponding to the tuple 
generated in step 202 is inserted into the pattern-mapping table 36. Processing then 
20 moves to decision block 206 in which it is determined whether additional items of the 
recordset returned in step 198 need to be processed. If so, processing returns to step 
200. If not. processing moves to step 208 in which a query is performed on the DTDM- 
Attribute table 92 to return a recordset containing all of the attributes thereof Processing 
then moves to step 210 in which a loop is initiated for each attribute returned in the 
25 recordset generated in step 208. Processing then moves to step 212 in which a tuple is 
created in an UpdateAction table (see F,g. IB for the interrelation of the UpdateAction 
table with the rest of the pattem mapping tables 36). Processing then moves to step 214 
in which mapping corresponding to the tuple generated in step 212 is inserted into the 
pattem mapping table. Processing then moves to decision block 216 which determines 
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„heth««iditiona.«*.«s<.f.her«ords«ge„««=a in step 208 n.d to be processed 
accordingto.he,oop2,0. Use. prcces«„g««™to «ep 2,0. If not. p—g moves 

to step 218. • iT> r 

ta.t=p2,8.a,u=,yisp=rfortnedonaLinicPaner„tab.. (s^F.g. IBfor 

4. i„t««ta.onof.heUn,*at.er„ table »ith.tar«tof,he pattern capping t*.« 36) 
to retumarccordset containing .1. of thelinksthc^f. Processing then move, to step 
220inwhichaloopisinidatedfore,chHnkre.umedintherecordse.generated,nstep 
,18 Proo=ssingthenn,ovestost=p222in«bicha,ue,yisp«fonnedtode.er»neaa o. 
a« groups .nvoivedinthispartioaiar link, processing then n^vesto step 224 in wh,cha 

loop is initialed for the name of each group. 

(00220) Processing ti«n moves to step 226 in which a query is performed to 
determ^eauniiue identifying ^»_in. once this um,ue.=^c»_xnUdeternnned 
processing moves to step 22Sinwhichatup.eiscreatedin the CreateAcUontablew-th 
^u„,,ue.=.™_.n. processes then move, to step 230.whichthetu^e,sn^e 
, intothepattemmatchingtableBe. After wh,c^ processing moves to deacon block 232 
to which it is detennined v*ether addidona. groups need to be processed for the of 
eachi^p. Ifso.processingr=mn..ostep224. If „o, processing moves ahead to step 
234 At step 234. a ,uery is performed to determine all of the (from, to) pairs for tb.s 
particular W of the loop initiated at step 220. TMs canb. stored in the fbnn of a 
0 recordset. Processing then moves to step 236 which initiate a loop for each pa. 

g^in«»recordse.developedats.ep234. Processing then moves to step 238 m 
wMchan assi^edactionis created co,responding.othat(f.om,to)pair.Processms the. 

moves to 240 in which this assigned action is inserted into the pattem mappmg table 
corresponding to the data developed in step 238. 
25 1.0221, Processingthenmovestodecisionbloclc242inwhichi,isdetennmed 
whetheradditionalpairsneed to be processed in theloop initiated a. step 236. If so 
proces^gretums to Step236. If not. proces.ngmoves.0 decision bloclc244mwh.chu 
Udetennined Whether additional hnks need to be pn>ccssed in the loop initiated at step 

220. If so. processing remrtts to step 220. If not, processmg ends. 
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[00222] By way of summary, the pattern mapping tables 36 are initialized by 
putting the generated pattern and the corresponding actions together. For all of the node 
patterns. e.g., I, put action create (T) . For all of the attribute patterns, e.g., t. a, 
put action update (T.A) . For the link pattern, e.g., A-B, there are different cases. If 
the link pattern is not related to a group, then based on the mapping rule, if the 
relationship is one-to-one, then put assign (a. B_iid, b. iid) . if the relationship is 
one-to-many, then put assign <B. parent _A_iid, a. iid) . If the link pattern is 
related to a group G, then put G first, then handle the relationship between A-*G and 
G-*B separately as described before. 

[00223] There are two types of reorganizations of metadata mentioned herein. One 
type is intended for identity sets ~ the result of which is to convert some attributes into 
items. The effect of this operation on the pattern-mapping table 36 follows. If an 
attribute A of item i is changed into item i_a with attribute value, then the corresponding 
actions for that attribute is preferably changed as foUows: 

(leeonpo8e(I_A. value) , 

assign ( I . A . paren t_I_iid , I . iid) 

[00224] For example, the eontactauthors . authoriDs is changed to an item 
. named contactauthors_authoriDs. Hence the action for attribute pattern 

eontactauthors . authorlDs changes tO: 

deeoa»ose ( eontactauthors . authorlDs . value) , 

assi^ (eontactauthors_au.thorIDs .parent contactauthors_iid , 
) eontactauthors . iid) 

[00225] The second type is for inline attributes, which essentially copies the 

attribute of one item into another item. Hence, if we assume item b is inlined as attributes 

of item A, and B has ci cn as attributes then all of the assigned loading actions are 

D queried for m the format of assign(A.B_iid, B.iid) , which generates a one-to-one 
relationship between items a andB. Then, all of the attributes of item b are retrieved from 
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the DTDM-Attribute table 92. Then, for every attribute Q of Item B, the action is 
replaced with: 

assign ( A. B.Ci, B.Ci) , 
assign (A.. B.C2, B.Cz), 

5 

assign (A. B.Cn/ B.C„ ), 



[00226] For example, if pcdata is iniined, then, for the book-title— pc3>ata link 

pattern, the following actions result; 

10 

assign (book"title.PCDArEA_value, PCDATA, value) 



[00227] The schema now needs to be cleaned up by replacing all actions with a 

*(.*) + .pcDATA_vaiue pattern with ♦(.*) + . 
15 [00228] The pattern mapping table 36 generated from the metadata described in the 

example follows: 





Actions 


PCDATA 

book title 

article 

title 

eson tac tau tbocs 

inoneg'iraph 

editor 

author 

name 

f irstnaiae 

lastnaine 

affiliation 


create (PCDATA) 
create (book) 
create (booktitle) 
create (article) 
create (title) 
create (oontaetauthors) 
create (monograph) 
create (editor) 
create (author) 
create (name) 
create (firstname) 
create (lastnaine) 
create (affiliation) 


contactautliors . authorlDS 
editor .name 
author . id 
PCDATA . value 


update ( eontactauthor . au thorlDs ) 
update (editor . name) 
update (author . id) 
update ( PCDATA . value ) 


book-»bookti tie 
book-»author _ 

book— editor 

bookti tie—PCDATA 
article— title 
article— author 

article— affiliation 


assign (book .booktitle_iid, booktitle . lid) 
create (Gl) , assign (book . Gl__iid, Gl.iid.) , 
assign ( author. parent_Gl_iid = Gl.iid) 
create (Gl) , assign (book . Gl_iid, Gl.iid), 
assign(Gl.editor.iid, editor. iid) 
assign (booktitle . PCDATA_iid , PCDATA . iid) 
assign (article. title_iid, title, iid) 
create ( G2 ) , assign ( G2 . parent_article_iid , 
article . iid) , assign (G2 . author_iid, author . iid) 
create (G2) , assign (G2.parent_article_iid, 
article, iid) , assign (G2.affiliation_iid, 
affiliation . iid) 
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asrtxcle— ^eon'tac'tau'tliors 

title-^PCDATA 
monograph-* title 
monograph-^authox 
ttonograph-^ditox 
editor— ►book 

edito x ^ m onograph 
author-*najnfi 
me-*firstnaioe 



firs tname— PCDATA 

las-tnajne-*'PCDATA 

affiliation-^PCDATA 

affiliation-*book 

affiliation-^booktitle 

affiliation-particle 

affiliation-»title 

affiliation-* 

contactauthors 

affiliation-»iiionograph 

af f ilia tion-»edi tor 
affiliation-^author 
affiliation _ name 
aff ilia tion-»firs tname 

aff ilia tion-»las tname 

affiliation-»af filiation 



assign ( article . contaetaiithors_iid , 
eentaetauthors . iid) 

assign (title . PCDAIA_iid , PCDATA . iid) 
assign (monograph. title_iid, title . iid) 
assign (monograph . author_iid , author . iid) 
assign (monograph. editor_iid, editor, iid) 
create (G3) , assign {G3 .paxent_editor_iid, 
editor. iid) , assign <G3 .book_iid, book. iid) 
create (G3) , assign (G3 .parent_editor_iid, 
editor. iid) assign (G3.monograph_iid, monograph . iid) 



assign (author . name_iid , name . iid) 
assign (name . f irstname_iid, first 



e.iid) 



I iid. 



ie.PCDATA_iid, PCDATA. iid) 

1 . PCDATA_iid , PCDATA .iid) 



assign (n 
assign (f irstna 
assign (lastnaa 
create (AG) , assign ( A. G.parent_affiliation_iid, 
affiliation . iid) , assign (AG . PCDATA_iid , PCDATA .iid) 
create (AG) . assign (AG . parent_af f iliation_iid , 
affiliation . iid) , assign (AG . book_iid , book . iid) 
create (AG) , assign (AG . parent_af f iliation_a.a.d , 
affiliation. iid) , assign (AG. booktitle_iid, 
booktitle.iid) 

create (AG), assign (AG. parent_affiliatxon_iid, 
affiliation . iid) , assign (AG . article_iid. , 
ar^ticle . iid) 

create (AG) , assign (AG . parent af f iliation_ia.d , 
af filiation. iid) , assign (AG. title_iid, title, iid) 
create (AG) , assign (AG . parent_af f iliation_iid , 
eif filiation . iid) , assign (AG . contactautlxors_iid , 
oontaetauthors . iid) 

create (AG) , assign (AG . parent_af f iliation_iid , 
affiliation, iid) , assign (AG. monograph_iid, 
monograph . iid) 

create (AG) , assign (AG . parent_af f iliation_iid, 
af filiation, iid) , assign (AG. editor_iid, editor.iid) 
create (AG) , assign (AG. parent_affiliation_iid, 
affiliation, iid) , assign (AG. author_iid, author, iid) 
create (AG) , assign (AG .parent_af £iliation_ia.d, 
affiliation. iid) , assign (AG. name_iid, name. iid) 
create (AG) , assign (AG . parent_af f iliation_iid , 
affiliation . iid) , assign (AG . f irs tname_iid , 
f irstname . iid) 

create (AG) , assign (AG . parent_af f iliation_iid, 
affiliation, iid) , assign (AG. lastnaine_iid, 
las tname . iid) 

create (AG) , assign (AG . parent_af f iliation_a.id , 
affiliation, iid) , assign (AG. affiliation_iid, 
affiliation . iid) 



[00229] Then, the following pattern-mapping table results: 
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Actions 

cxeate (PCDATA) 
create (book) 
create (booktitle) 
create (axticae) 
create (title) 
create (eontaetauthors) 
create (monograph) 
create (editor) 
create (author) 
create (name) 
create (f irstname) 
create (lastxiane) 
create (affiliation) 



1 Pattern 
PCDATA 
book 

booktitle 

article 
title 

mtactauthors 
monograph 
editor 
author 



affiliation 



contactauthors . authorlDS 



editor, name 
author, id 
PCDATA, valne 



book— booktitle 
book—author _ 

book—editor _ 

booktitle— PCDATA 
article— title 
article- author 

articie— affiliatio 



article _ contactauthors 

title— PCDATA 
monograph— title 
monograph— author 
monograph— editor 
editor— book 

editor-tnonograph 

au tho r— naiae 

name— fir s tname 
nane— lastname 
firs tname— PCDATA 
las tname— PCDATA 
affiliation— PCDATA 

affiliation— book 

affiliation— book ti tie 

affiliation— article 

affiliation— title 



create i ari 3.xa.ai,.j.wf . — - 

decompose (contactauthor6_authorIDs .value) , 
assign (contaetauthors_autborIDs . parent . 
contIctauthors_iid , contactauthors . xxd) 
update (editor . name) 
■update (author . id) 
update (PCDATA. value ) 



■Issi^ ibook .booktitle_iid, booktitle . laa, 
create(Gl) , assign (book. Gl iid. Gl.xxd) , 
assign (author .parent_Gl_a.a.d = Gl.iid) 
J^aS(Gl) , assign(book.Gl_iid, Gl.i:Ld) , 
assian(Gl.editor.iid, editor. iid) 
tsSS Soktitle.PCDATA_value, PCDATA. value) 
LsiSlarticle. title, title . PDATA_value) 
^r.«at#.(G2) , assign (G2.parent_artxole_3.a.d, 
::S2i iU asSgn(GS.author iid^ auth^^ 
create (G2). assign (G2. parent article xxd, 
article. iid) , assign (G2.aff3.1iation_a.a.d. 
affiliation. iid) . 
assign (article . oontactauthors_a.xd, 
contactauthors . iid) 

Ssign (title. PCDATA_value, PCDATA, value 

assij^ (monograph, title, title . PCDATA^value) 

aasign|monograph.author_iid, 

assign (monograph.editor_xxd ^J^^or.xxd) 

create (G3) , assign (G3 .parent_edx tor xxd, 

SItor.iid), assign(G3.book iid book xxd) 

create (G3), assign (G3 .parent_edxtor_xxd, 

:StS. iid>' assiS (G3 .monograph_iid monograph, xxd) 

assign(author.name_firstname, -"^-f^^^^T^ ' 

assxgn(author^namO-=-— ;^^;^^S^.e) 

Tsrii: :::::srt:::r'xa:"a«..pcDATA_viiue) 

assil^ firstname.PCDATA_v^ue. PCDATA. value) 
assil^(lastname.PCDATA_value, 

SatitSr? iisl^ (AG.JIrent affiliation iid, 
SflSltiok.iid)rassign(AG.booktitle_xxd, 

S::^:eSG;r'aLign(AG.parent ^^^^ 
affiliation. iid) , assign (AG . artxcle_xad , 

^:e!AJ)^'assign(AG.parent affiliation iid. 
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affiliation-* 
contaetauthors 

affiliation-*inonograph 

affiliation -editor 
a£filiation-»author 
.affiliation-! 



af f iliation-»f ir s taane 
affiliation-^lastnane 
affiliation-»af filiation 



-56- 



affiliation.iid) , assign (AG. title_iid, title, iid) 
create (AG) , assign (AG .parent affiliation iid, . 
affiliation, iid) , assign (AG. eontactauthoxs__iid, 
contactauthors . iid) 

create (AG) , assign (AG. parent affiliation iid, 
affiliation . iid) , assign (AG .iiionograph_iid, 
monograph . iid) 

create (AG), assign (AG . parent affiliation iid, 
affiliation.iid) , assign (AG. editor_iid, editor. iid) 
create (AG), assign (AG. parent affiliation iid, 
affiliation.iid), assign (AG. author_iid, author. iid) 
create (AG), assign (AG. parent affiliation iid, 
affiliation . iid) , assign (AG . naine_f irstnane , 
assign (AG . naiiie__lai 



cnssiteTAG) , assign (AG. parent affiliation iid, 
affiliation . iid) , assign (AG . f irs tname , 
f irstname . PCDATA-valiae) 

create (AG) , assign (AG . parent affiliation iid, 
affiliation.iid) , assign (AG. las tnaae, 
las tname . PCDATA_value) 

create (AG), assign (AG . parent affiliation iid, 
affiliation.iid) , assign (AG. affiliation_ia.d, 
affiliation . iid) , 



Figures 12-1 4 

[00230] The step of loading the XML data 1 6 of the ciocument 12 into the tables 
20 of the relational database 14 (identified by reference number 60 in Fig. 2) will now be 
described in greater detail with respect to Figs. 12-13. Turning to Fig. 12, the step of 
loading the document 12 into the tables 20 of the relational database 14 generally involves 
traversing the element tree shown in Fig. 14. A simple recursive function is shown by 
example in Figs. 12-13 which initiates with step 246 and receives input of the type 
shown in the note 248 associated with step 246 in Fig. 12. This process then calls a 
function visit node with the arguments of (root. 0). This fiinction essentially assists the 
loader 30 in walking through the element tree. Processing then moves to step 250 where 
this process ends. Turning to Fig. 13, the steps performed in the visit_node function 
are shown in greater detail in which processing moves to step 252 which receives input 
shown in a note 254 associated with step 252 in Fig. 13. Processing then moves to 
decision block 256 in which it is determined whether the variable root is equal to a null 
value. If so, processing ends. If not, processing moves to step 258 in which a vector 
variable action_vector is Set equal to a method result of the fiinction called therein. 
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[00231] Processing then moves to step 260. which calk a fbnaion ,0 execute aU 
aaions reWve to the particular root and vector of the particular node 302 a. issue. 
Processing then moves to step 262, «hich emeries the appropriate attributes table to 
retum a recortset containing aU atmbu.es of the particular node 302 being visited. 
5 Processing then moves to step 264. which initiates a loop for each of the attributes of the 
particular node and. in turn, processing moves to step 266 in which each attribute of that 
node is visited. Processing moves to decision block 268, which determines wheU^er 

additional attributes need to be processed for this node 302. If so, processing returns to 
Step 264. If not, processing moves to step 270. 
,0 [002321 Ats.ep270,allof.hechadrenoftheparticularnode302beingviAedare 

detemnned. Then processing moves to step 272. which initializes a posi^.n vanable. 
Processing then moves to step 274, which initiates a loop for each child of the particular 
node 302 b»ng visited. At step 276, .he po.i^» variable initiated a. step 272 « 
incremented. Processing then moves to step 278, which calls a funcdon to visit each of 
,5 the nodes 302 comprising children of «,e particuUr node (essentially this traverses ti,e 
treeinastyle shown in Fig. 14). Next, each of the links corresponding to the parUcular 
node 302 are visited in s«p280. Processing then moves to decision block 282 wtach 
determines whether additiomd children need to be processed for Ais node 302. If yes, 
processing returns to s.ep 274. If no. processing terminates at step 284. 
20 [002331 lngeneral,toloadanXMLdocumen.l2,acreatedXMLtreeisuaverse,^ 
such as that shown in Fig. 14, although tins could include any tree-like smrcture, e.g.. a 
parser tree in addition to ti.e DOM model shown in F,g. 14. in dep.h.fir« and use the 
pattem-mapping table 36 collected during the mapping of ti,e DTD 18 to de«rmmeti.e 

disposition of each node 302 and its data. 
25 [002341 Allofthenodes302arevisited.oti,elowes.levelsofthe.reemodel300 

and all of the edges 304 are visited on ti>e way up to the roo, node (number 1). For every 
node 302 (or link), its node 302 (or hnk) pattern is relieved and the pattern mappmg 
table 36 is queried for the correspondmg actions on ti,e tables 20 of the relation^ database 
14. The following listing depicB pseudo^de for tt.e lolling algori.hm. 
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MODUIS da'ta_loading 

VMOABIXS: 
5 patternMappingTable pit; 

RelationalTables rt; 
SCHLTree xt; 

BEGXK 

10 VisitUode (xt . getRoot { ) ) ; 

END 

PROCEDUB£ visitNode 
15 Node n; 



BEGIN 



IF n is MUXiL 

xetiim; 
ESUDZT 

GTE action from the pit based on pattern ] 
doAetion (action) ; 

FOR all the attribute a 

SET action list from the pit based on 
pattern attribute a. 

FOR EACH action 
doAction (action) ; 
END FOR 

FOR ALL the children d of Node n 
visitNode (d) ; 
visitLink(n, d) ; 
EHD FOR 
END FOR 

END visitNode 

PBOCEDURE vi-sitLink 
IN: 

Node from, to; 



BEGIN 

get action list from the pit based 
50 link pattern from to. 

FOR each action 
doAction (action) 
END FOR 

55 

END visitLink 

PROCEDURE doAction 
IN 
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Jket:lon aet; 

IF action is create (T) 
S CREATE a new tuple in that 

table T with the iid and order 
ELSE IF action is vipdate (T.A) 

VPDKTE the current tuple of the 
ooSesponding table T with the 
10 value of that attribute X. 

ELSE IF action is assign (T. A, S.B) 
UPDATE the last tuple in table T 
field A with value in the 
last tuple o£ table S column B. 
15 ELSE IP action is decompose (T. A) 

CREATE multiple tuples in the 

each attribute has a single value. 
END IF 

20 END doAction 

END MODULE 

1D023S1 After .he pattern-mapping table 36 is generated, .l>e pattem-raapptog table 
25 36oanb.usedtoloadtheXMLdata 16intotherelatiotuJ«*ema22. 

[002361 TheXMLdt^umen. , 2 sho«,l.emn relating ,0 Example . attd as shown 
a^atreestrucmre in Rg. Mean be usedasancxantpleof how therelational database 14 
canbeloaded withthedata ,6 including the step of^aversing the node f« 300 shown,. 
Fig. 14. 

30 «)02371 First(andwi.hreferencetoF,gl4and.hep»..em-raa.chi„gtable36 

generated in accordance with Btample 1). node 1 is encount^ed, which is an article type 
node. I„ response, the element pattern "article- is queried in the pattern ntapping table 36. 
Frotn the patter, capping table 36, a recordset result may be returned which may 
include, -ce.^ This implicates the loading acdon discussed earher 

35 - therefore, one new tuple is cteated in table "article- with two fields: U.^: l, and 

order: 1. 

[00238] Ne«, thefirst child of node 1 is visHed, which is of type "*itx.". Again, 
the pattern matching table 36 is queried once again which would return the panem 

t^ita.) A new tuple is thereby created in table "ti^.", with two fields: 
xando^.: 2. The„,thechildofthisnodeisvisited,whichisnode3oftype 



40 
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PCDATA. Again the pattern mapping table 36 is queried, and, responsively, a new tuple is 
created in the PCDATA table with three fields, iid: 1, ordex-. 3. Then, any value 
attributes of the node are vished. By again querying the attribute portion of the pattern 
mapping table 36, the value: "XML Relation Mapping" wiU be filled into the "pcdata" 

5 table in the column "value". 

[00239] Then, the link between node 3 and node 2 is visited, which is the link 

pattern "title— pcdata". The corresponding action "assign (title . pcDATA_vaiue , 
PCDATA, value)" would be returned. Therefore, the "value" field of this tuple is placed in 
PCDATA, i.e., "XML Relation Mapping" into the "title" table, so that the field in table 

10 "title" is updated with the value "XML Relation Mapping". 

[00240] Then, the link between node 2 and node 1 is visited, upon which the link 

pattern "article— title" would be returned ft^om the pattern matching table 36. The 
corresponding action is "assign (article . title , title. pcDATA_vaiue) ". Hence, the 
"title" field in the tuple in "article" table is updated with the value "XML Relation 

1 5 Mapping". Of course, the loading of the remainder of the nodes in the DOM tree 300 in 
Fig. 14 would follow in due course to load the entire contents of the data 16 in the 
document 12. 

[00241] It should be noted that, because the elements of the data are the basic units 

in the XML document 12, the system 10 should stiU store the data of the corresponding 
20 elements into their tables during the loading process. However, in the case of inline 

attributes, if the element is in-lined into an attribute, then the table of that element is no 
longer used after the loading. Therefore, those unused tables could be deleted afl:er loading 
the data. The following table shows the result of the data loading in the relational tables: 



booktitle 



the XML Hancibook 



1 XML Relation Mapping 



contactauthor . idd | 
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Patti eoerrieri 



contaetautlioars 



aaren-t editor idd 



iBonograph i.dd 



Monograph 



Repository Support fox Metadata- 
I based Legacy Migration 



author . iid 



editor idd 



author 
iid I order 



36 



id 



me firstname 



Xin 
Gail 

Wang-cixien 
Sandra 
Charles F. 

iLiJ field Is'^filred'w.th the tecil^T^rr o£ handling inuit:. leveX 
grouping that is not addressed in this application. 



! lastname 



Zhang 

Mitchell 

Lee 

Heiler 

Goldfaida 

grescod 



parent 



iid 
1 


parent article. xid 
1 


1 


affiliation. a 


dd 1 


2 


a 


2 






3 
4 


1 
1 


3 




1 





contactauthors . authorlDS 

Parent contaetaiithors idd 



affiliation 
1 iid I Order 



I iid I editor ■ idd j 
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PCDAXA.. 

value 



book-title 



AG (continued) 



last- 


paxent- 
srelated- 
work . idd 


book 
.Idd 
.idd 


contact- 

authoirs 

.idd 


graph 
.idd 


editor 
.idd 


author 
.idd 










1 







[00242] The following summarizes the metadata tables 34 used and described in 
10 this application and discusses these tables as th^ relate to the invention. 



Table 


Description 




Storing Original DTb 


DTDM-Item 


Stores the Elements and Groups of the ongma-l 


DTDM-Attribute 


Stores the Attributes of the Elements or Groups of the 


DTDM-Me s ting 


Stores the Nesting relationships of the origo-naa DTD. 






Storing Converted DTD 


IM-Item 


Stores the Elements and Groups of a converted DTD. 


IM-Attribute 


Stores the Attributes of the Elements or Groups of a 
converted DTD. _ 


IM-Nesting 


Stores the Nesting relationship of a converted DTD. 





Storing Table Schemas 


TS-JC 


Stores the Join Constraint information 








Keeping Track of Mapping from Origj.na.1 DTD into 
Converted DTD 


Pattern 


Keeps the patterns. 


Pattern -Mapping 


Keeps the mapping from patterns on the table schema. 
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[00243] By way of further summary, the present invention contemplates an XML 
to a relational database mapping approach. This approach is based on storing the DTD ] 8 
into metadata tables 34 and manipulating these tables 34 to generate metadata tables 
describing a relational schema, and then to generate the relational schema 22 therefrom. 
5 Several techniques have been discussed, e.g., identifying sets, inline attributes, aad to 
identify more aspects of XML document generation, and to refine the metadata generated 
therein. Benefits of the present invention include: 

[00244] Integrity. The DTD 18 is stored in metadata tables 34: This ensures an 
integrity constraint when modifying the DTD 18. 
10 [00245] Simplicity. The automatic loading of an XML document 12. The pattern- 
mapping table 36 keeps track of the items and links during all of the steps of refinement 
of the metadata. Hence, the loading process can load the XML document 12 directly 
based upon the pattern-mapping table 36. 

[00246] Capture of semantics and more user-friendly query interface. The 

1 5 identifying sets articulating refinement further expresses multiple-value attributes into 
tables, in which a user can access each value instead of being able to access the v/hole 
value. For example, by breaking the idseps type attribute into tables with an idbef 
typed column, normal joins can be performed to determine the referenced elements. Not 
only does this approach have the benefits of using an element type as a table name, but 
20 also the inline attribute refinement determines the extra attributes of some element types, 
which were originally treated as a single element type. For example, booktitie becomes 
the attribute of book, instead of two tables titled booktitie and book connected by 
joins. Not only does this keep better semantics in the mapping, but also in the refinement 
of the mapping. The metadata thereby has been further improved for a better query 
25 interface (e..g, being able to query title instead of title.PCDATA_va.lue). 

[00247] A reusable and scalable method. The mapping approach is anticipated to 

be useful for the reuse of DTDs and XML documents that have many different DTDs 
since this mapping approach is performed by the queries on the metadata tables. 
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[00248] Flexibility. DifFereant XML relational mapping could be represented by 
manipulations on the metadata tables. Hence, the inventive approach extends the 
automatic loading described herein to work for different kinds of mapping algorithms and 
even for different types of databases. 

[00249] Extensible. The present invention has a great deal of potential than merely 
for mapping and loading of data, as it could also be used for optimizing query 
performance, extending the query capability on the XML document, reconstructing the 
XML data for a different DTD, and integrating with other XML data or relational data. 
[00250] It has been determined by the applicants that a DTD can specify 
important properties of an XML document including: grouping, nesting, occurrence, 
element referencing, etc. In order to capture these rich relationships between elements into 
the relational schema of a relational database, the metadata model proposed herein 
consists of item, attribute, and nesting relationships. The inventive mapping approach, 
based on the metadata tables generated herewith, can successfully capture the 
relationships between elements into constrains in a relational database. For example, the 
nesting properties are captured as foreign key constraints between different tables. The 
metadata approach also makes the automatic loading of XML documents into relational 
tables possible and quite easy. 

[00251] While the invention has been, specifically described in connection with 
certain specific embodiments thereof, it is to be understood that this is by way of 
illustration and not of limitation, and the scope of the appended claims should be 
construed as broadly as the prior art will permit. 
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Claims 

What is claimed is: 

1 . A method for generating a schema (22) for a relational database ( 1 4) 
corresponding to a document (12) having a document-type definition (18) and data (16) 

5 complying with the document-type definition (1 8), the document-type definition ( 1 8) 
having content particles representative of the structure of the document data (16), as well 
as loading the data into the relational database (14) in a manner consistent with the 
relational schema (22), the method comprising the steps of: 

extracting (24) metadata (34) representative of the document-type definition (18) 
1 0 fi-om the document-type definition (18); 

generating (28) the schema (22) for the relational database (14) fi-om the metadata 
(34), wherein at least one table (20) is thereby defined in the relational database (14) 
corresponding to at least one content particle of the document-type definition (18) via the 
metadata (34); and 

1 5 loading (30) the document data ( 1 6) into the at least one table (20) of the relational 

database (14) according to the relational schema (22) in a manner driven by the tnetadata 
(34). 

2. The method of claim 1 wherein the extracting (24) step fiirther comprises 
20 the step of generating (28) an item metadata table (90) corresponding to element type 

content particles in the document-type definition (18). 

3. The method of claim 2 wherein the extracting step fiirther comprises the 
step of creating at least one de&ult item in the item metadata table (90). 

25 

4. The method of claim 3 wherein the extracting step fiirther comprises the 
step of updating the item metadata table (90) with each of the element type content 
particles of the document-type definition (18). 
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5. The method of claim 4 wherein the extracting step fixrther comprises the 
step of generating (28) an attribute metadata table (92) corresponding to attribute type 
content particles in the document-type definition (18). 

6. The method of claim 5 wherein the extracting step fiirther comprises the 
step of creating a default attribute value in the attribute metadata table (92) corresponding 
to attributes of element types in the document-type definition (18). 

7. The method of claim 6 wherein the extracting step further comprises the 
step of updating the attribute metadata table (92) with each of the attribute type content 
particles of each element type of the document-type definition (18). 

8. The method of claim 7 wherein the extracting step further comprises the 
15 step of generating (28) a nesting metadata table (94) for storing data items corresponding 

to nesting relationships implied in the document-type definition (1 8). 

9. The method of claim 8 wherein the extracting step further comprises the 
step of generating (28) a row in the nesting metadata table (94) corresponding to each 

20 relationship between items identified in the item metadata table (90). 

10. The method of claim 9 wherein the generated nesting table (94) row 
indicates the cardinality between a pair of items. 

25 11. The method of claim 1 0 wherein the cardinality is one of one-to-one and 

one-to-many. 

12. The method of claim 8 wherein the generated nesting table (94) row 
indicates a relationship between a parent item and a child item. 
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13. The method of claim 8 wherein the generated nesting table (94) row 
indicates a relative position of a child item with respect to other items in a definition of 
the corresponding parent item. 

5 

14. The method of claim 7 wherein the generating (28) step further comprises 
the step of creating at least one table in the schema (22) of the relational database 
corresponding to at least one row of the metadata item table (90). 

10 15. The method of claim 1 4 wherein the generating (28) step further comprises 

generating (28) at least one default field in the table of the schema (22). 

1 6. The method of claim 1 5 wherein the generating (28) step fiirther comprises 
the step of altering the schema (22) of the relational database to add at least one column to 

1 5 the at least one table in the relational database (14) schema (22) corresponding to each 
row of the metadata attribute table (92). 

17. The method of claim 1 6 wherein the generating (28) step further comprises 
the step of altering the tables in the schema (22) of the relational database to add columns 

20 representing links between tables (20) of the relational database schema (22) 

corresponding to each relationship identified in each row of the metadata nesting table 
(94). 

18. The method of claim 1 7 wherein the generating (28) step further comprises 
25 the step of altering the tables in the schema (22) of the relational database (14) by adding 

a foreign key to a parent table if the identified relationship is a one-to-one relationship. 
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19. The method of claim 1 8 wherein the generating (28) step further comprises 
the step of altering the tables in the schema (22) of the relational database (14) by adding 
a foreign key to a child table if the identified relationship is a one-to-many relationship. 

20. The method of claim 1 9 and fiirther comprising the step of initializing a 
link table (36). 

21. The method of claim 19 and further comprising the step of determining 
whether each item in the metadata nesting table (94) contains a group type. 

22. The method of claim 19 and further comprising the step of initializing a 
pattern-mapping table (36). 

23 . The method of claim 22 and further comprising the step of directly 
mapping a link into the link table (36) for each item in the metadata nesting table (94) that 
does not contain a group type. 

24 . The method of claim 23 and fiirther comprising the step of creating an 
additional link table (36) containing a mapping of a link pattern for each group type 
identified in the metadata item table (90). 

25 . The method of claim 24 and fiirther comprising the step of creating a create 
tuple loading action in the pattern mapping table (36) associated with a particular pattern 
corresponding to each item in the item metadata table (90). 

26. The method of claim 25 wherein the loading (30) step fiirther comprises 
the step of creating an update tuple loading action in the pattern mapping table (36) 
associated with a particular pattern corresponding to each attribute in the attribute 
metadata table (92). 
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27. The method of claim 26 wherein the loading step forther comprises the 
steps of: 

creating a create tuple loading action in the pattern mapping table (36) associated 
with a particular pattern corresponding to each group in a link; and 

creating an assign action tuple loading action in the pattern mapping table (36) 
associated with a particular pattern corresponding to each pair in the same hnk; 

corresponding to each link in the link pattern table (36) . 

28 . The method of claim 27 wherein the loading step further comprises the 
step of forming a tree structure (300) with the document data (16). 



29. The method of claim 28 wherein the loading step fiirther comprises the 
step of traversing the formed tree (300) and updating the at least one relational database 

15 (14) table according to the rows of the pattern mapping table (36). 

30. The method of claim 1 and fiirther comprising the step of optimizing (26) 
the metadata. 



20 



31. The method of claim 30 wherein the optimizing (26) step fiirther 
comprises the step of eliminating duplicate particle references in the metadata (34). 

32. The method of claim 31 wherein the optimizing (26) step fiirther 
comprises the step of simplifying references to corresponding elements, links and 

25 attributes in the metadata (34). 

33 . The method of claim 32 wherein the optimizing (26) step further 
comprises the step of inlining particular attributes of the metadata (34). 
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34. The method of claim 1 wherein the document (12) is an XML document. 

35. The method of claim 1 wherein the document-type definition (18) is a 

DTD. 

5 

36. The method of claim 1 wherein the data ( 1 6) is tagged data. 

37. A system (10) for generating (28) a schema (22) for a relational database 
(14) corresponding to a document (12) having a document-type definition (18) and data 

10 (16) complying with the document-type definition ( 1 8), the document-type definition 
(18) having content particles representative of the structure of the document data (16), as 
weU as loading the data (16) into the relational database (14) in a manner consistent with 
the relational schema (22), the system comprising: 

an extractor (24) adapted to read a document-type definition (18) that extracts 
1 5 metadata (34) representative of the document-type definition (1 8) fi-om the document- 
type definition (1 8); 

a generator (28) operably interconnected to the extractor (24) for generating (28) 
the schema (22) for the relational database (14) fi-om the metadata (34), wherein at least 
one table (20) is thereby defined in the relational database (14) corresponding to at least 
20 one content particle of the document-type definition ( 1 8) via the metadata (34); and 
a loader (30) operably interconnected to the generator (28) for loading the 
document data (16) into the at least one table (20) of the relational database (14) according 
to the relational schema (22) in a maimer driven by the metadata (34). 

25 38. The system ofclaim 37 wherein the extractor (24) generates an item 

metadata table (90) for storing data items corresponding to element type content particles 
in the document-type definition (18). 
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39. The system of claim 38 wherein the extractor (24) creates at least one 
defauh item in the item metadata table (90). 

40. The system of claim 39 wherein the extractor (24) generates a row in the 
5 item metadata table (90) corresponding to each of the element type content particles of 

the document-type definition (18). 

41 . The system of claim 40 wherein the extractor (24) generates an attribute 
metadata table (92) corresponding to attribute type content particles in the document- 

10 type definition (18). 

42. The system of claim 41 wherein the extractor (24) generates a row in the 
attribute metadata table (92) corresponding to each of the attribute type content particles 
of the document-type definition (18). 

15 

43 . The system of claim 42 wherein the extractor (24) generates a nesting 
metadata table (94) for storing data items corresponding to nesting relationship implied in 
the document-type definition (18). 

20 44. The system of claim 43 wherein the extractor generates a row in the 

nesting metadata table (94) corresponding to each relationship identified in the document- 
type definition (18) between items identified in the item metadata table (90). 



25 



45. The system of claim 44 wherein the generator (28) creates at least one 
table in the relational database schema (22) of the relational database (14) corresponding 
to data in the metadata hem table (90). 
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46. The system of claim 45 wherein the generator (28) alters the schema (22) 
of the relational database to add a columns to the at least one table of the relational 
database (14) schema (22) corresponding to each row of the metadata attribute table (92). 

47. The system of claim 46 wherein the generator (28) alters the tables in the 
schema (22) of the relational database to add columns representing links between tables of 
the relational database schema (22) corresponding to each relationship identified in each 
row of the metadata nesting table (94). 

48. The system of claim 47 wherein the generator (28) alters the tables in the 
schema (22) of the relational database (14) by adding a foreign key to a parent table if a 
relationship identified between a pair of tables is a one-to-one relationship. 

49. The system of claim 48 wherein the generator (28) alters the tables in the 
schema (22) of the relational database by adding a foreign key to a child table if a 
relationship identified between a pair of tables (20) is a one-to-many relationship. 

50. The system of claim 37 and further comprising a link table (36). 

51. The system of claim 50 wherein the system (10) determines whether each 
item in the metadata nesting table (94) contains a group type content particle. 

52. The system of claim 5 1 and fiirther comprising a pattem-mapping table 
(36) in an initialized state. 

53. The system of claim 52 wherein the system (10) directly forms a link in 
the link table (36) for each item in the metadata nesting table (94) that does not contain a 
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54. The system of claim 53 wherein the loader (30) creates an additional link 
table (36) containing a mapping of a link pattern for each group type identified in the 
metadata item table (90). 

5 55. The system ofclaim 54 wherein the system (10) retrieves a preselected set 

of rows corresponding to each item in the metadata item table (90). 

56. The system of claim 55 wherein the system (10) creates a create tuple 
loading action in the pattern mapping table (36) associated with a particular pattern 

1 0 corresponding to each item in the item metadata table (90). 

57. The system of claim 56 wherein the system (10) creates an update tuple 
loading action in the pattern mapping table (36) associated with a particular pattern 
corresponding to each attribute in the attribute metadata table (92). 

15 

58. The system of claim 57 wherdn the system: 

creates a create tuple loading action in the pattern mapping table (36) associated 
with a particular pattern corresponding to each group in a link; and 

creates an assign action tuple loading action in the pattern mapping table (36) 
20 associated with a particular pattern corresponding to each pair in the same Unk; 

wherein each created action corresponds to each link in the link pattern table. 

59. The system of claim 58 wherein the loader (30) forms a tree structure 
(300) with the document data (16). 

60. The system of claim 59 wherein the loader (3 0) traverses the formed tree 
structure (300) and updates the at least one relational database (14) table (20) according to 
the rows of the pattern mapping table (36). 
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61. The system of claim 37 and fiirther comprising an optimizer (26) for 
refining the metadata. 

62. The system of claim 61 wherein the optimizer (26) eliminates duplicate 
5 particle references in the metadata (34) . 

63 . The system of claim 62 wherein the optimizer (26) simplifies references to 
corresponding elements, Unks and attributes in the metadata (34). 

10 64. The system of claim 37 wherein the document (12) is an XML document. 

65. The system of claim 37 wherein the document-type defmition (1 8) is a 

DTD. 

15 66. The system of claim 37 wherein the data (16) is tagged data. 

67. A system (10) for generating (28) a schema (22) for a relational database 
(14) corresponding to a document (12) having a document-type definition (18) and data 
(16) complying with the document-type definition (18), the document-type definition 

20 (1 8) having content particles representative of the structure of the document data (16), as 
well as loading the data (16) into the relational database (14) in a manner consistent with 
the relational schema (22), the system comprising: 

an extractor (24) adapted to read a document-type definition (18) that extracts 
metadata (34) representative of the document-type definition (18) fi-om the document- 

25 type definition (18), wherein the extractor stores the metadata (34) in at least three tables 
comprising a metadata item table (90) containing metadata (34) representative of element 
types in the document-type definition (18), a metadata attribute table (92) containing 
metadata (34) representative of attributes in the document type definition (18), and a 
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metadata nesting table (94) containing metadata (34) representative of nesting 
relationslMps between particles in the document type definition (18). 

68. The system of claim 67 and fiirther comprising a pattern-mapping table 
(36) initially constructed in an initialized state. 

69. The system of claim 68 wherein the pattern mapping table (36) is loaded 
with actions indicative of relationships between the data (16) and the document-type 
definition (18). 

70. The system of claim 67 and further comprising a generator (28) operably 
interconnected to the extractor (24) for generating (28) the schema (22) for the relational 
database (14) from the metadata (34), wherein at least one table (20) is thereby defined in 
the relational database (14) corresponding to at least one content particle of the 
document-type definition (1 8) via the metadata (34). 

71 . The system of claim 70 wherein the generator (28) forms a table (20) with 
at least one default field in the relational database (14) for each item contained in the 
metadata item table (90). 

72. The system of claim 71 wherein the generator (28) forms a column in a 
corresponding table (20) in the relational schema (22) corresponding to each attribute in 
the metadata attribute table (92) linked to an item in the metadata item table (90). 

73 . The system of claim 72 wherein the generator (28) forms a link between 
tables in the relational database corresponding to nesting relationships contained in the 
metadata nesting table (94). 



wo 01/61566 



-76- 



PCT/DS01/O5105 



74. The system of claim 67 and further comprising a loader (30) operably 
interconnected to the generator (28) for loading the document data (16) into the at least 
one table (20) of the rdational database (14) according to the relational schema (22) and 
driven by the metadata (34). 
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Fig. 1 
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1^ store DTD Into ^ 
DTDM Tables 



'Create Table^ 
(schema from the) 
VDTDM Tables^ 



XML Data 
Loading 



Create and Fill 
DTDM.Item Table 



Create Relational 



Load XAAL Document 
Into Tables 



Create and Fill 
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Table 



Create Columns in 
Relational Tables 
(for Attributes) 



Create and Fill 
^ DTDM_ Nesting Table 



Add Foreign Keys 
(for Nesting 
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Init Pattern-Mapping 
Table 



Fig. 2 
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Create two Default items, called 
• PCDATA", and "ANY.GROUP" ^ 



66 




\ 





INSERT INTO DTDM_ltem VALUES 
(<Unlque_ID>, 'PCDATA'. 'PCDATA') 
INSERT INTO DTDM.Item VALUES 
(<Unique_ID>. 'ANY.GROUP'. 

'Group.Ciiolce') 




INSERT INTO DTDIW.Item VALUES 
(<Unique_ID>, <olement_Type_Name>, 
<Eiement_Type_Type>) 



Fig. 3 
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Get Hem_ID of PCDATA from DTDM.ttem table 





74 







Create default attribute "value" for PCDATA 



<: 



76 



INSERT INTO DTDM_Attrlbute 
VALUES (<Unique_ID>, 
<Element_ID>, •Value', 'CDATA', 
•#IMPLIED') 



Get (next) Element Type in the DTD 



Get Item ID of this Element Type from 

DTDM.Item table J 



Get (next) Attribute of this Element Type and 
. add Attribute to DTDM_ Attribute table 




84 



INSERT INTO DTDM.Attribute 
VALUES (<Unique_ID>, <ttem_IO>, 
<Attribute_ID>, <Attribute_Type>. 
<Attribute_Default_Val ue>) 



Fig. 4 
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404 

^ 

E.g. put ANY_GROUP and PCDATA into 
the tables, and also intialtze the nesting 
relationships for the ANY.GROUP to all 
the available ELEMENTS. 




INSERT INTO 
DTDM.Item VALUES 
J<UnlqueJD>. 
<Generate_Group_Name>, 
GROUP.CHOICE') 



INSERT INTO 
DTDM.Nesting VALUES 
(<Unique_ID>, 
<New Group_ID>. 
<PCDATA_ID>, '1:1'. true', 
<Posltion_Of_Current_ 
Element_Type>) 



INSERT INTO 
DTDM.Nesting 
VALUES 
(<Unique_ID>, 
<Current_Element 
.Type_ID>. 
<Any_Group_ID>, 
•1:n', true', 0) 



438 

/ 

INSERT INTO 
DTDM.Nesting 
VALUES 
(<Unique_ID>, 
<Current_Element 
Type_ID>. 
<PCDATA_ID>. 
. 'false'. 0) 



Fig. 5 
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^ 464 

/Store the Nesting RelatlonshipX/ 
/ between current Child and the \ 
\ Paront into the DTDM_ Nesting j 




INSERT INTO DTDM_Nestino VALUES 
{<Unlque_ID>, <element_ld>. <ref_l(i>. 
<Chlld.Ratio>, <ChUd.Optional>, 
<Child.Posltion>) 



Stop ) 



Fig. 5A 
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134 



get all attributes from the 
( DTDM-Attribute table and their associated 
Kerns from the DTDM-ltem table 



SELECT A.Name AS Attrlbute_Name, type AS 
Attribute_Type. LName AS Item. Name FROM 
DTDM_AttribiJte A, DTDM.item I 
WHERE a.pld = 




Fig. 7 
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(Get ID'S of Items participating In \ 
nesting relationships J 

I '. ^152 

i y 

■^JT^ Get (next) nesting relationship J 



1^ 



SELECT f.name AS from_name, t.name as 
to.name. Ratio FROM DTDM.Nesting n. 
DTDM_ltem f, DTDM.Item t 
WHERE f.ld = FromlD. t.id s TolD 



Ratio is 1:1 
or 1:n7 



put foreign key in 



158 



ALTER TABLE from.name 
ADD (<to_name>_lid_<tndex>) 



(" put foreign key In child table J^-- 



ALTER TABLE to_name ADD 
(parent_<from_name>_iid_<index>ltiTEGER) 



More Nesting 
Relationships? 



Fig. 8 
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Init Pattern Mapping 
Table 



Initialize the 
Link Table 



168 



Initialize the Pattern 
Mapping Table 



Fig. 9 
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CREATE TABLE LInkO AS 
SELECT A-FromlD. B.TolD, ATolD as GroupO 
FROM DTOM_ Nesting A. DTDM.Nesting B. 

DTDM.Itom C 
WHERE A.T0ID = B.FromlD 
AND A.T0ID «= CJD 

AND C.type LIKE •GROUP.%' 



SELECT COUNTC) 
FROM DTDM_ltem 
WHERE type LIKE 'GTOup.%' 
And ID IN 

(SELECT FrotnID FROM LinkO) 



CREATE TABLE Llnk<lteratlon_number> AS 

SELECT A.FromlD, B.TolD, GroupO 

Group<lteration_numbBr>, 

A.T0ID AS Group<lteratlon_nuniber> 
From DTDM.Nesting A, 

Llnk<lteration_number-1> B, 
DTDM_ltem C 
WHERE A.T0ID = B.FromlD 
AND A-TolD = C.ID 

AND C.type LIKE •Group.%' 



SELECT COUNTC) 
FROM DTDM_ltem 
WHERE type LIKE 'GROUP."/.' 
AND ID IN 

(SELECT fromID FROM 
Llnk<lteration_number> table) 



details are described in the sub activity diagram 
in Fig. 1 1 I 



Fig. 10 
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Get all the Links Uon\\ 
LInkPattem Table J 

' \ ^18 

(^Get (ne xt) LInkJ ) 

\ ^22 0 

222 




Fig. 11 
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INPUT: 
Element root; 

PattemMapping pattem.mapping; 



visitNode(root,0) 



Fig. 12 
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PARAMETER: 




Element root 




GLOBAL VARIABLE: 




PattemMapping pattern_nriapping 



C Vector action.vector s 
pattem_mapping.gBtActionsOfNodBPattem(root.getNaineO) 



r execut6Aetions(action_vector. root) J 



(^Get all the Attributes of this Node rootj 



I 



■{Get (next) Attribute.nodeJ' 



X 



visltAtt ribute(attribute_node)^ 

268 



264 
266 




( Get all the Children of root J 

i -^272 



position = 0 ^ 



Get (next) Chlld.node 



c 



I 



^ i ^278 

( ^visltNode(Child_node. position) J 

. * V'-'ZBO 

( visitLlnk(root. Child. node) ) 




Fig. 13 
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